Strategies for Imputing Missing Values and Removing Outliers in the Dataset for Machine Learning-Based Construction Cost Prediction
Autor(en): |
Haneul Lee
Seokheon Yun |
---|---|
Medium: | Fachartikel |
Sprache(n): | Englisch |
Veröffentlicht in: | Buildings, 27 März 2024, n. 4, v. 14 |
Seite(n): | 933 |
DOI: | 10.3390/buildings14040933 |
Abstrakt: |
Accurately predicting construction costs during the initial planning stages is crucial for the successful completion of construction projects. Recent advancements have introduced various machine learning-based methods to enhance cost estimation precision. However, the accumulation of authentic construction cost data is not straightforward, and existing datasets frequently exhibit a notable presence of missing values, posing challenges to precise cost predictions. This study aims to analyze diverse substitution methods for addressing missing values in construction cost data. Additionally, it seeks to evaluate the performance of machine learning models in cost prediction through the removal of conditional outliers. The primary goal is to identify and propose optimal strategies for handling missing value in construction cost records, ultimately improving the reliability of cost predictions. According to the analysis results, among single imputation methods, median imputation emerges as the most suitable, while among multiple imputation methods, lasso regression imputation produces the most superior outcomes. This research contributes to enhancing the trustworthiness of construction cost predictions by presenting a pragmatic approach to managing missing data in construction cost performance records, thereby facilitating more precise project planning and execution. |
Copyright: | © 2024 by the authors; licensee MDPI, Basel, Switzerland. |
Lizenz: | Dieses Werk wurde unter der Creative-Commons-Lizenz Namensnennung 4.0 International (CC-BY 4.0) veröffentlicht und darf unter den Lizenzbedinungen vervielfältigt, verbreitet, öffentlich zugänglich gemacht, sowie abgewandelt und bearbeitet werden. Dabei muss der Urheber bzw. Rechteinhaber genannt und die Lizenzbedingungen eingehalten werden. |
8.03 MB
- Über diese
Datenseite - Reference-ID
10773928 - Veröffentlicht am:
29.04.2024 - Geändert am:
05.06.2024