National Environmental Public Health Tracking Network Downscaler Ozone Metadata — Census Tract Level Publication Date 01/11/2017 Background The Downscaler ozone dataset provides the output from a Bayesian space-time downscaling fusion model called Downscaler (DS) that combines ozone monitoring data from the US EPA Air Quality System (AQS) repository of ambient air quality data (e.g., National Air Monitoring Stations/State and Local Air Monitoring Stations (NAMS/SLAMS)) and simulated ozone data from the deterministic prediction model, Models- 3/Community Multiscale Air Quality (CMAQ). The files contain estimates of the mean prediction and associated standard error for each of the 2010 US Census Tracts within the contiguous US for each day of the modeling year. The data are intended for use by professionals comparing air quality and health outcomes through techniques such as case crossover analysis. Other uses may be developed at a later time. The standard errors of the predictions should be taken into account when using the results. Data Values The dataset includes nine variables: STATEFIPS: State FIPS code COUNTYFIPS: County FIPS code CTFIPS: Census tract FIPS code LATITUDE: Latitude of census tract centroid (degrees) LONGITUDE: Longitude of census tract centroid (degrees) YEAR: Year of prediction DATE: Date (day-month-year) of prediction DS _O3 PRED: Mean estimated 8-hour average ozone concentration in parts per billion (ppb) within 3 meters of the surface of the earth DS_O3_STDD: Standard error of the estimated ozone concentration Geographic Scale All census tracts in the contiguous United States & Scope Time Period January 1, 2001 to December 31, 2014 Raw Data The air quality monitoring data from the NAMS/SLAMS network were downloaded from Processing the Air Quality System (AQS) database. Only Federal Reference Method (FRM) samplers were included in the dataset. Data from all Pollutant Occurrence Codes (POC) were used. The data were downloaded covering January 1, 2001 through December 31, 2014. The CMAQ data was created from version 4.7.1 of the model using Carbon Bond Mechanism- 05 (CB-05). The CMAQ data are daily maximum 8-hour ozone concentrations calculated ona12kmx 12 km grid for the continental United States. The CMAQ emissions data are based on 2008 NEI version 2, with specific updates including data from regional planning organizations and year-specific data for some larger point sources, including continuous emissions monitoring data for NO, and SO2 sources. The onroad mobile source emissions were generated using MOVES 2010B, except for California, in which data provided by the California Air Resources Board was interpolated to each year. In addition, the meteorological data used are from the Weather Research and Forecasting Model (WRF) version 3.4 at 12 km simulation. The WRF simulation included the physics options of the Pleim-Xiu land surface model (LSM), Asymmetric Convective Model version 2 planetary boundary layer (PBL) scheme, Morrison double moment microphysics, Kain- Fritsch cumulus parameterization scheme and the RRTMG long-wave and shortwave radiation (LWR/SWR) scheme. The CMAQ model results were developed in November 2013. The DS combines the actual monitoring data and the estimated ozone concentration surface (CMAQ) to predict ozone through space and time. It attempts to find an optimal linear relationship between CMAQ output and measurement data to predict new "measurements" at each spatial point in the area of interest. Fitted parameters are based on sampling from distributions (built into the code by the developers) rather than an objective function minimum, which allows calculation of a standard error associated with each prediction. It differs from other fusion efforts by not assuming the existence of a true air pollution process driving both the monitoring data and CMAQ output. Instead, downscaling relates air data and model output using a linear regression with bias coefficients (additive and multiplicative) that can vary in space and time. This approach to modeling provides a new answer to the “change-of-support” problem where we would like to predict air pollution at a certain spatial resolution, but must reconcile the difference between point monitoring data and areal average CVAQ concentrations. Model parameters are fit just to paired CMAQ and air monitoring data, thus CMAQ output that do not contain monitoring sites are not used in model fitting. Additional processing of the data was conducted to standardize variable names across all years of data and to expand FIPS variable into separate statefips, countyfips, and ctfips variables. Additional Information Berrocal, V., Gelfand, A. E. and Holland, D. M. (2011). Space-time fusion under error in computer model output: an application to modeling air quality Berrocal, V., Gelfand, A. E. and Holland, D. M. (2010). A bivariate space-time downscaler under space and time misalignment. The Annals of Applied Statistics 4, 1942-1975. Berrocal, V., Gelfand, A. E., and Holland, D. M. (2010). A spatio-temporal downscaler for output from numerical models. J. of Agricultural, Biological,and Environmental Statistics 15, 176-197) is used to provide daily, predictive PM2.5 (daily average) and O3 (daily 8-hr maximum) surfaces for 2010.