An unprecedented number of wildfires in Canada caused unhealthy levels of air pollution in many areas of the US in late June and early July of this year. According to NBC News, as the smoke from these fires drifted across North America, air quality alerts were issued across the US covering the entire state of Minnesota; large parts of North Dakota, South Dakota, eastern Nebraska, Iowa, Illinois and Missouri; northeastern Colorado; and much of western and central Washington.
These fires also set the internet ablaze with blogs, news and Tweets. This is an understandable reaction given that the US does not commonly experience these levels of poor air quality. A blog went as far as saying that the air quality in Minnesota was worse than in Beijing. While this was true temporarily (due to the influence of the Canadian wildfires), in general, the air quality in the US is much better than in Beijing and other polluted cities of the world. It is hard to imagine how people in these cities are able to cope with these dangerous levels of air pollution on a regular basis. Having grown up in Mexico City I can relate to those experiencing elevated levels of air pollution and poor visibility on a regular basis. The picture to the right shows an exceptional event in Mexico City, where an exceptional event was not a day with poor air quality but, to the contrary, a clear day when you could see the Popo and the Izta volcanoes that surround and watch over the city. The love story of these two scenic volcanoes can be read here.
However, as it comes to dispersion modeling, the exceptional events from the Canadian wildfires are anything but a love story. That is the case because the high monitored observations resulting from these fires will most likely impact the design value of NOx, CO, PM2.5 and PM10 concentrations for the period of 2013-2015. This is true because it is difficult to segregate the contribution from exceptional events from the rest of the sources in ambient air. Only in cases of exceedances of the NAAQS will States be prompted to flag these data and request their exclusion from the design value calculations. Therefore, the monitors affected by these wildfires will exhibit higher concentrations that will bias the monitored distribution and result in higher design values.
Data from these ambient monitors are used in cumulative dispersion modeling evaluations to account for background levels of air pollution. EPA’s recent guidance recommends the use of the maximum or design value concentrations from the monitor to combine with the concentrations from AERMOD. As an example, PM2.5 is one of the most challenging pollutants to model due, in part, to the high background concentrations that can be 25 μg/m3 or more. Since the 24-hour PM2.5 standard is 35 μg/m3, in a cumulative analysis this leaves only 10 μg/m3 to account for the source being modeled and its neighbors. However, as I have described in numerous talks and presentations, combining the design value from the monitor with the design value from AERMOD results in a combined probability that is extremely unlikely to happen. This is akin to rolling the dice and waiting for a “six”, it is hard enough to get a “six” on a roll of the dice (one chance out of six), but, it is even more difficult to get “sixes” when rolling two dice at the same time (one chance out of 36 = 1/6 * 1/6). Likewise, assuming that the 98th worse modeled concentration happens at the same time as the 98th worse monitored concentration is significantly less likely.
Normal and Skewed distributions
Phenomena in nature approximates the normal distribution. If we use the height of people as an example, we will find that a few people are very short, many people are average height, and a few people are very tall. If such a sampling is done at random the distribution of heights will look similar to the normal distribution shown below.
However, when a normal distribution is influenced by very high observations, the distribution is “stretched” to the right. This is called a positively skewed distribution. Even a few very high observations will cause the distribution to get “stretched” to the right. This means that the 98th (or 99th) percentile will significantly increase as high observations are included in the distribution. While the inclusion of these high observations may be fine for determining NAAQS compliance, these extreme values are not suitable for use as background in cumulative modeling evaluations.
That is why I have proposed the use of the 50th percentile to account for background concentrations in cumulative modeling evaluations instead of the design value (i.e., 98th or 99th percentile). Why did I choose the 50th percentile? Well, the 50th percentile is the median. The median is the value where half of the observations are above and half of them are below that value. The median is less influenced by exceptional events. This is one of the reasons why home values are commonly reported as median values and not average values. This is a better measure than the average (mean) which is heavily influenced by one or few home prices that are significantly more expensive than the rest of the homes in an area.
Additionally, the probability from pairing the 50th percentile monitored concentration with the design value from AERMOD results in a combined probability that is more conservative than the design value of the NAAQS. Just like the price of homes, monitored distributions are affected by very high unusual observations that “stretch” the distribution to the right. The 50th percentile (median) is actually higher than the most likely value in a positively skewed distribution as depicted in the figure above. Thus, the median is still a conservative value.
Predicted and the monitored distributions are independent from each other. This independence has been proven when comparing them on a temporal and spatial basis. In other words, when comparing these on an hour by hour, and receptor by receptor basis. This temporal and spatial mismatch prompted EPA to evaluate model performance with Q-Q plots. Q-Q plots are created by comparing the ranks from the modeled and monitored values irrespective of time and space. This means that the maximum value from the monitor is compared to the maximum value from the model. The same is done for the 2rd highest values and all subsequent ranks. In this pairing exercise, the location and time of each of the ranked values will be different since these values are decoupled in time and space. In summary, the model is evaluated with Q-Q plots because of the lack of correlation between observed and modeled values.
What does EPA say about Exceptional Events?
The EPA defines exceptional events as unusual or naturally occurring events that can affect air quality but are not reasonably controllable. Here is the complete definition from 72 FR 13560.
U.S. EPA (2007) “Treatment of Data Influenced by Exceptional Events; Final Rule” (72 FR 13560) pursuant to the 2005 amendment of CAA Section 319.
EPA is defining the term ‘‘exceptional event’’ to mean an event that:
(i) Affects air quality;
(ii) Is not reasonably controllable or preventable;
(iii) Is an event caused by human activity that is unlikely to recur at a particular location or a natural event; and
(iv) Is determined by EPA through the process established in these regulations to be an exceptional event.
It is important to note that natural events, which are one form of exceptional events according to this definition, may recur, sometimes frequently (e.g., western wildfires). For the purposes of this rule, EPA is defining ‘‘natural event’’ as an event in which human activity plays little or no direct causal role to the event in question.
The high readings stemming from the Canadian fires qualify as exceptional events. However, the flagging of exceptional events is only performed by State agencies when there are attainment issues. Therefore, the data collected from these monitors is likely to contain observations that will overpredict background concentrations in modeling analyses.
Natural events such as wildfires and volcanic activity put things in perspective since these can have significant effects on the air quality of a region. In cases such as the Canadian wildfires, air pollution from industrial activity is significantly dwarfed by the pollution from these events. Additionally, achieving compliance in cumulative modeling evaluations will be much more difficult unless the hours affected by the Canadian fires are excluded from the design value calculations. These hours should be excluded from design value calculations based on EPA’s guidance related to exceptional events. Another more reasonable way to determine background concentrations for dispersion modeling evaluations is by using the median instead of the design value since the median is less influenced by extreme events (e.g., wildfires). Furthermore, the pairing of median values with the 99th or 98th percentile value from AERMOD result in a combined probability that is more conservative than the form of the probabilistic 1-hour and 24-hour NAAQS.