2014-11-18

Country statistics

When doing various analyses over country statistics, I've ran across a few problems:

  • The name of the same country differs across various sources
  • Some countries may be missing
I had to match them manually in my spreadsheet, and it was boring.

Since I plan doing that thing quite often, I created a tool. I ran it several times with data from various sources, and collected several names for some countries, say, Falkland Islands:

  • Falkland Islands
  • Falkland Islands (Malvinas)
  • Islas Malvinas
  • Malvinas
For others, I made compromises, because they're disputed, ambiguous and/or included in one another. For example, I've associated all of the following with the code PS (for Palestine):
  • Palestine
  • Palestinian Territory
  • Gaza
  • West Bank
  • West Bank and Gaza
While this is a pretty simple script, you can save time by using it. And if you wish to contribute (even small additions to country_codes.csv), send me a pull request.

Here's something that resulted from combining 2012 data from the World Bank, current USD prices of a combo meal from Numbeo, and population data from Wikipedia. I just had to arrange the data in CSVs with rows of the form country, value, and the script took care of merging them.

Norway is right next to Venezuela, so I guess this doesn't really say much about countries. Or does it?



I pasted the result into Google Drive, and made this pretty bubble chart. Knock yourselves out!