Use of data imputation tools to reconstruct incomplete air quality datasets: A case-study in Temuco, Chile

dc.contributor.authorQuinteros, María Elisa
dc.contributor.authorLuc, Siyao
dc.contributor.authorBlazquez, Carola
dc.contributor.authorCárdenas-Re, Juan Pablo
dc.contributor.authorOssaf, Ximena
dc.contributor.authorDelgado-Saboritg, Juana-María
dc.contributor.authorHarrisong, Roy M.
dc.contributor.authorRuiz-Rudolphl, Pablo
dc.date.accessioned2023-08-08T19:17:28Z
dc.date.available2023-08-08T19:17:28Z
dc.date.issued2019-03-01
dc.descriptionIndexación Scopuses
dc.description.abstractMissing data from air quality datasets is a common problem, but is much more severe in small cities or localities. This poses a great challenge for environmental epidemiology as high exposures to pollutants worldwide occur in these settings and gaps in datasets hinder health studies that could later inform local and international policies. Here, we propose the use of imputation methods as a tool to reconstruct air quality datasets and have applied this approach to an air quality dataset in Temuco, a mid-size city in Chile as a case-study. We attempted to reconstruct the database comparing five approaches: mean imputation, conditional mean imputation, K-Nearest Neighbor imputation, multiple imputation and Bayesian Principal Component Analysis imputation. As a base for the imputation methods, linear regression models were fitted for PM2.5 against other air quality and meteorological variables. Methods were challenged against validation sets where data was removed artificially. Imputation methods were able to reconstruct the dataset with good performance in terms of completeness, errors, and bias, even when challenged against the validations sets. The performance improved when including covariates from a second monitoring station in Temuco. K-Nearest Neighbor imputation showed slightly better performance than multiple imputation for error (25% vs. 27%) and bias (2.1% vs. 3.9%), but presented lower completeness (70% vs. 100%). In summary, our results show that the imputation methods can be a useful tool in reconstructing air quality datasets in a real-life situation.es
dc.identifier.citationAtmospheric Environment Volume 200, Pages 40 - 49 1 March 2019es
dc.identifier.doi10.1016/j.atmosenv.2018.11.053en
dc.identifier.issn1352-2310
dc.identifier.urihttps://repositorio.unab.cl/xmlui/handle/ria/52321
dc.language.isoenes
dc.publisherAtmospheric Environmentes
dc.rights.licenseAttribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)en
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/en
dc.subjectAir pollutiones
dc.subjectEnvironmental epidemiologyes
dc.subjectMissing dataes
dc.subjectMultiple imputationes
dc.subjectSingle imputationes
dc.subjectWood-burninges
dc.titleUse of data imputation tools to reconstruct incomplete air quality datasets: A case-study in Temuco, Chilees
dc.typeArtículoes
Archivos
Bloque original
Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
1-s2.0-S1352231018308367-main.pdf
Tamaño:
2.16 MB
Formato:
Adobe Portable Document Format
Descripción:
TEXTO COMPLETO EN INGLÉS
Bloque de licencias
Mostrando 1 - 1 de 1
No hay miniatura disponible
Nombre:
license.txt
Tamaño:
1.71 KB
Formato:
Item-specific license agreed upon to submission
Descripción: