Thursday 19 February 2015

Lab 1: Z-Scores, Mean Center, and Standard Distance

Introduction/Problem/Research Question/s:

There is a firm that has employed the help of geographers which requires the evaluation of the geography and distribution of tornadoes throughout Kansas and Oklahoma. The states would like to mandate a building of tornado shelters in areas where there have been a large number of tornadoes. However, there is an argument from some that they are unnecessary because of cost and likely lack of use in some areas. The goal is to locate areas that would be of high tornado probability and access whether shelters would be necessary. In addition, there will need to be a basis in which it is more appropriate to require the building of tornado shelters. This would include calculating statistics using the given files including tornado locations and their width and paying special attention to the patterns over time.

Methods:

There are four main tools used in analyzing the spatial data of tornadoes in Oklahoma and Kansas. The first is the mean center. This is the average of the x and y coordinates, which then creates a hypothetical point that displays the average place in which a tornado would occur. The second is a weighted mean center which takes into consideration the frequencies of the grouped data, in this case the width of tornadoes. Having this point allows one to see if there is a difference in mean centers and weighted mean centers. In some cases, the weighted mean could be more important than the mean center, but it is important to know both to have a better understanding of the spatial data presented.

Another tool is the use of standard distance. This is basically a spatial version of standard deviation. In ArcMap, one can choose how many standard deviations they want to display. In the case of this lab, only one standard deviation circled is shown. The closer a point is to the middle of this circle, the closer it is to the mean. An important note is that unlike regular standard deviation, standard distance cannot be negative.There is also a weighted standard distance that acts in a similar way to that of a weighted mean center. A map cannot have a weighted standard distance if it does not have a weighted mean center.


Results:

Map 1.a displays the mean and weighted center of tornadoes in Kansas and Oklahoma from 1995 to 2006. A mean center is displayed in pink, which shows where the average x and y coordinates are for the given data. There is also a weighted mean center that is based on the width of the tornado. This map shows that the larger tornadoes are located slightly farther south and west on the map. It is visible to the naked eye to see that there appear to be larger tornadoes in the southern region (depicted by the yellow graduated circles- the bigger circles being the larger tornadoes).


Map 1a. Depicts the mean and weighted center of Tornadoes in Kansas and Oklahoma from 1995 and 2006
After evaluating the maps, it was clear that there were a number of tornadoes mapped that had a tornado width of zero. Ergo, some of the points which appear to be small tornadoes, actually are not tornadoes at all. This led to the creation of map 1b., showing the same data as map 1, except after extraction of the tornadoes with a width of zero. The mean center moved slightly northward, indicating that there were more false tornadoes in the south. The weighted width mean center of the tornadoes did not change, however. Figure 1 shows the zoomed in view of map 1b. in the region of the mean centers, which gives a closer view of how the data is adjusted with and without the zero tornado width data.



Map 1b. This is the modified data that eliminated the tornadoes with a width of zero from the tornadoes from 1995 to 2006. The mean center is slightly adjusted from the data with the 0-1780 feet widths.  
Figure 1. Zoomed in view of the reconfigured '95-'12 tornadoes. When looking at the maps, it was realized that some of the widths for the tornadoes were zero, so when the data was reconfigured (taking out the tornadoes with zero width), there was a slight change in the mean center (the green and magenta circles), but no change in the weighted mean (the blue squares). There is a lack of change in the weighed data because the width of zero does not effect the weighted mean center. 
Map 2 shows the mean and weighted center for tornadoes in Kansas and Oklahoma from 2007 to 2012. This map reaffirms that the larger tornadoes seem to be south of the mean center because the weighted width mean center is below the mean center. Further, there are more tornadoes occurring in Kansas than Oklahoma, but at a smaller scale and likely severity of damages and lives lost.
Map 2. Mean and Weighted Center Tornadoes in Kansas and Oklahoma from 2001 to 2012
Map 3 brings both of the prior maps together to be able to compare the data on one map. This map, in particular, was difficult to find the correct symbology and coloring to allow clear visibility on the layers. A trend visible in this map is that the pull of the '07-'12 tornadoes are moving toward Kansas in both the mean center and weighed mean center. This means that Kansas has more of a threat to tornado damage than Oklahoma.
Map 3. Mean and Weighted Center of Tornadoes in Kansas and Oklahoma from 1995 to 2012

Map 4 displays the standard and weighted distance for tornadoes in Kansas and Oklahoma from 1995 to 2006. Much like the above maps, the weighted width standard distance is separating the two distance circles from one another. This contributes to earlier findings of the weighted standard distance pulling farther south than the standard distance.
Map 4. Standard and Weighted Distance for Tornadoes in Kansas and Oklahoma from 1995 to 2006.
Map 5 shows the standard and weighted distance for tornadoes in Kansas and Oklahoma from 2007 to 2012. Although the distance circles have moved north (just like the mean and weighted center for the 2007-2012 data in map 3), the wider tornadoes have moved farther south than and east. After seeing this map, it is difficult to determine where a tornado shelter would be most suitable because there is larger distance separating these circles than the ones prior.
Map 5. Standard and Weighted Distance for Tornadoes in Kansas and Oklahoma from 2007 to 2012.
Map 6 shows the weighted standard distance and mean for tornadoes in Kansas and Oklahoma from 1995 to 2012. There are quite a few reoccuring trends visible in this map. Overall, tornadoes from 2007 to 2012 were weighted more north easterly than the ones form 1995 to 2006. Depending on the weather over these years, one might want to look at one set of the data over the other. However, if this is an issue of global warming and the north becoming warmer than the past, perhaps the conditions for the weighted mean center are more reliable than ever.
Map 6. Weighted Standard Distance and Mean for Tornadoes in Kansas and Oklahoma from 1995 to 2012.
Map 7 depicts the standard deviation of the number of tornadoes per county in Kansas and Oklahoma from 2007 to 2012. There are eight counties that show a standard deviation above 1.5, meaning that these counties see an abnormally large number of tornadoes in comparison to the mean, which is shown in yellow on the map. Interestingly enough, the blue counties on the border between Kansas and Oklahoma are at the epicenter of the weighed man centers for both years, whereas there are a few outliers around the very edge of the hypothetical weighted standard distance circle.
Map 7. Standard Deviation of Tornadoes (by count) in Kansas and Oklahoma from 2007 to 2012.
The z-score is the standard deviation for a particular sample. Three counties were chosen to have their z-scores identified including Russell County, Kansas; Caddo County, Oklahoma; and Alfalfa County, Oklahoma. The standard deviation of these counties was found through creating a standard deviation chloropleth map and using the specific statistics that are given from its output. This standard deviation was found to be 4.3 and the mean was 4. The three counties had as follows: Russell- 25 tornado count with z-score of 4.88; Caddo- 13 tornado count with z-score of 2.09; and Alfalfa- 4 tornado count with z-score of .23. Russell, the highest z-score of the three and also a dark blue greater than 1.5 standard deviation county, is shown as being so high because of its very unlikely occurrence that other counties with have more tornadoes than it. Whereas Alfalfa's tornado count is barely above the states' mean, it has a very small z-score, showing that it is slightly more unlikely to have 5 tornadoes in one county than the mean.

The task was given to find what the sample number would have to be to exceed the number of  tornadoes 70% of the time. To do this, 70% was found on the z-score chart which turned out to be .52. Because this is exceeded more often than not (which would show up on the negative side of a standard deviation graph), the .52 was changed to -.52 to account for that. After calculations, 1.76 tornadoes would have to occur in a county to exceed tornadoes 70% of the time. Another task was given to find what the sample number would have to be to exceed the number of  tornadoes 20% of the time. The chances of a county having 80% of the most tornadoes is quite slim, so the z-score deviation would need to be very high. In this case, 80% was found on the z-score table at .84 and after calculations, it was determined that one would need 7.6 tornadoes to occur in a county for this to be true.


Conclusion:

Overall, all of the methods used above are related to one another in some way. Generally, the weighted mean centers are being pulled south, whereas the weighted standard distances and weighted mean centers are moving toward the northeast over time.

This study has large implications on not only the budget of the states, but survival rates of their citizens. It makes sense that the general population would find these shelters to be obsolete, but if the statistics say otherwise, then the likelihood of the shelter being used and saving lives is greatly increased. The tough position as a statistical researcher is determining where the cut off point is for a community to either have a tornado shelter or not. At the state level you obviously do not want to make the mistake of looking over important details and putting certain people at risk because of it.

The strength of each tornado and its damage play a big role in determining where a tornado shelter would be most suitable. Most would assume that the wider tornadoes would cause more destruction and loss of lives, but it would be interesting to see how that data would fit into the distribution of tornadoes above. Other information that would be useful to see the tornado trends over time would be weather related, which would include temperature, heat index, and dew point. This would be able to determine if there is more of correlation with the shift of tornado widths.

If I could redo one part of this lab, it would be changing the tornado width data for 1995-2006 so that there would be no tornadoes with the width of 0. Although it did not appear to effect my maps too much, I would have still liked to make it as accurate as possible. Unlike the last lab, for this we did not look at any raw data in Excel beforehand, so I wrongly assumed that it would be just fine to use the data. That shows that I need to take more time to analyze the raw data I have before I analyze their spatial meaning.

As a recommendation, I would encourage Oklahoma and Kansas to place tornado shelters in the regions with dark blue (see map 7), or the counties with standard deviations 1.5 above the mean. In the aspect of time, I would likely put more tornado shelters to the north and east of the weighted standard distance because there seems to be stronger tornadoes in that region. When looking at the data, there is no place in which I would say that a tornado shelter is not necessary. That being said, there are areas in which there seems to be higher numbers of tornadoes occurring, and those, given that there is room in the budget, should have shelters as well.

No comments:

Post a Comment