Use of data imputation tools to reconstruct incomplete air quality datasets: A case-study in Temuco, Chile


Por: Quinteros, M, Lu, S, Blazquez, C, Cardenas, J, Ossa, X, Delgado-Saborit, J, Harrison, R and Ruiz-Rudolph, P

Publicada: 1 mar 2019
Resumen:
Missing data from air quality datasets is a common problem, but is much more severe in small cities or localities. This poses a great challenge for environmental epidemiology as high exposures to pollutants worldwide occur in these settings and gaps in datasets hinder health studies that could later inform local and international policies. Here, we propose the use of imputation methods as a tool to reconstruct air quality datasets and have applied this approach to an air quality dataset in Temuco, a mid-size city in Chile as a case-study. We attempted to reconstruct the database comparing five approaches: mean imputation, conditional mean imputation, K-Nearest Neighbor imputation, multiple imputation and Bayesian Principal Component Analysis imputation. As a base for the imputation methods, linear regression models were fitted for PM2.5 against other air quality and meteorological variables. Methods were challenged against validation sets where data was removed artificially. Imputation methods were able to reconstruct the dataset with good performance in terms of completeness, errors, and bias, even when challenged against the validations sets. The performance improved when including covariates from a second monitoring station in Temuco. K-Nearest Neighbor imputation showed slightly better performance than multiple imputation for error (25% vs. 27%) and bias (2.1% vs. 3.9%), but presented lower completeness (70% vs. 100%). In summary, our results show that the imputation methods can be a useful tool in reconstructing air quality datasets in a real-life situation.

Filiaciones:
Quinteros, M:
 Univ Chile, Fac Med, Inst Salud Poblac, Programa Doctorado Salud Publ, Independencia 939, Santiago, Chile

 Univ Talca, Fac Ciencias Salud, Dept Salud Publ, Talca 3460000, Chile

Lu, S:
 Univ Michigan, Dept Environm Hlth Sci, 1415 Washington Hts, Ann Arbor, MI 48109 USA

Blazquez, C:
 Univ Andres Bello, Dept Engn Sci, Quillota 980, Vina Del Mar 2531015, Chile

Cardenas, J:
 Univ La Frontera, Inst Medio Arnbiente, Dept Ingn Obras Civiles, Ave Francisco Salazar,Casilla 54-D, Temuco 01145, Chile

Ossa, X:
 Univ La Frontera, Dept Salud Publ, Caro Solar 115, Temuco, Chile

 Univ La Frontera, Ctr Excelencia CIGES, Caro Solar 115, Temuco, Chile

:
 Univ Birmingham, Sch Geog Earth & Environm Sci, Div Environm Hlth & Risk Management, Birmingham B15 2TT, W Midlands, England

 ISGlobal Barcelona Inst Global Hlth, Barcelona Biomed Res Pk, Doctor Aiguader 88, Barcelona 08003, Spain

 Pompeu Fabra Univ, Placa Merce 10, Barcelona 08002, Spain

 Inst Salud Carlos III, Spanish Consortium Res Epidemiol & Publ Hlth CIBE, Ave Monforte de Lemos 5, E-28029 Madrid, Spain

Harrison, R:
 Univ Birmingham, Sch Geog Earth & Environm Sci, Div Environm Hlth & Risk Management, Birmingham B15 2TT, W Midlands, England

 King Abdulaziz Univ, Ctr Excellence Environm Studies, Dept Environm Sci, POB 80203, Jeddah 21589, Saudi Arabia

Ruiz-Rudolph, P:
 Univ Chile, Fac Med, Inst Salud Poblac, Programa Salud Ambiental, Independencia 939, Santiago, Chile
ISSN: 13522310





ATMOSPHERIC ENVIRONMENT
Editorial
Elsevier BV, THE BOULEVARD, LANGFORD LANE, KIDLINGTON, OXFORD OX5 1GB, ENGLAND, Reino Unido
Tipo de documento: Article
Volumen: 200 Número:
Páginas: 40-49
WOS Id: 000458225500005
imagen Green Submitted

MÉTRICAS