What is this?
Gale Primary Sources provides a digitized archive of all editions of
The Times (UK) from 1785 until 2020. To isolate what we consider to be "crisis-related coverage", we used a keyword search beginning with the word "cris!s" of the headlines and first 100 words of the articles in the corpus. Based on an assessment of these articles, the list of crisis-related keywords was expanded and the process repeated.
Next, Named Entity Recognition (NER) was used to determine mentions of places in this corpus of crisis-related coverage. Meanwhile, GeonamesCache (GNC) and GADM were used to generate a list of "known places". Iterating through the list of NER places, a combination of spellchecking and fuzzy-matching were used to find matches in the list of known places; since both GNC and GADM already have geocodes, the locations were immediately geocoded.
Finally, the points were aggregated to the nation-level and filtered by year, yielding the dataset that this map visualizes.
Caveats
The dataset is not by any means perfect. There were a few obstacles to geocoding. Here are a few, along with the solutions employed to deal with them:
- There is a Boston in the US and a Boston in The Philippines. Solution: When generating the list of known places, if there are two cities of the same name, the one with the highest population is retained (population data from GNC).
- There is a city named "Netherlands" in the US. Solution: After the dust settles, we go back to the geocoded locations and replace any places that share a name with a country with the geocode of the country.
This is a work in progress and there are still issues. The biggest result is that US locations are overrepresented and should be taken with a grain of salt.