Strategies for Imputing Missing Values and Removing Outliers in the Dataset for Machine Learning-Based Construction Cost Prediction
Author(s): |
Haneul Lee
Seokheon Yun |
---|---|
Medium: | journal article |
Language(s): | English |
Published in: | Buildings, 27 March 2024, n. 4, v. 14 |
Page(s): | 933 |
DOI: | 10.3390/buildings14040933 |
Abstract: |
Accurately predicting construction costs during the initial planning stages is crucial for the successful completion of construction projects. Recent advancements have introduced various machine learning-based methods to enhance cost estimation precision. However, the accumulation of authentic construction cost data is not straightforward, and existing datasets frequently exhibit a notable presence of missing values, posing challenges to precise cost predictions. This study aims to analyze diverse substitution methods for addressing missing values in construction cost data. Additionally, it seeks to evaluate the performance of machine learning models in cost prediction through the removal of conditional outliers. The primary goal is to identify and propose optimal strategies for handling missing value in construction cost records, ultimately improving the reliability of cost predictions. According to the analysis results, among single imputation methods, median imputation emerges as the most suitable, while among multiple imputation methods, lasso regression imputation produces the most superior outcomes. This research contributes to enhancing the trustworthiness of construction cost predictions by presenting a pragmatic approach to managing missing data in construction cost performance records, thereby facilitating more precise project planning and execution. |
Copyright: | © 2024 by the authors; licensee MDPI, Basel, Switzerland. |
License: | This creative work has been published under the Creative Commons Attribution 4.0 International (CC-BY 4.0) license which allows copying, and redistribution as well as adaptation of the original work provided appropriate credit is given to the original author and the conditions of the license are met. |
8.03 MB
- About this
data sheet - Reference-ID
10773928 - Published on:
29/04/2024 - Last updated on:
05/06/2024