Posts Tagged ‘open-data’

scatterplot-v8-combined

This is a Processing Sketch that visualizes data relating to climate change from the World Bank Data Indicators. Two visualizations can be toggled using a switch in the upper right corner. Specific data for each country can be viewed by hovering the mouse over each circle. Color scheme was borrowed from Cynthia Brewer’s Color Brewer tool.

Process:

  1. I downloaded data from World Bank Data indicators in the Climate Change category. I noticed problems with the raw data:
    • Not every country listed has data for the same year.
    • There are aggregates of countries within the dataset such as “East Asia and the Pacific” or “Sub-Saharan Africa”. While these may be useful, for my data-viz I’m attempting to compare only countries.
  2. To sort out countries only from the raw data:
    • Grabbed category data for 2010 as this was the latest year available for the CO2 kt category.
    • Combined the data into one spread sheet.
    • Used a join in QGIS to a world dataset file so that only matching countries would be imported. I used the ‘Country Code’ key from the World Bank data and joined it to Natural Earth‘s ne_10m_admin_0_countries.shp ‘Adm0_a3_is’ column.
    • To only keep matching countries from Natural Earth and the World Bank I filtered out data with NULL values using an SQL WHERE clause:
      • SQL SELECT * FROM ne_10m_admin_0_countries WHERE "pop_gni_co2_2010_Country Name" != 'NULL' AND "pop_gni_co2_2010_GNI_2010" !='' AND "pop_gni_co2_2010_CO2_2010" != '' AND "pop_gni_co2_2010_POP_2010" != ''
  3. I noticed that some countries lacked data for 2010. To fix this I manually added data from the next available previous year to try to fill in the gaps (probably not the most statistically sound method). These problem countries are listed below. A couple countries such as Myanmar and North Korea did not have any data from the World Bank so were excluded from the viz.
    • Countries shown that had no data for 2010. Their values were replaced with data from next available previous year. The countries and year of data is listed below:
      • Argentina GNI (country & per capita) 2006
      • Greenland GNI (country & per capita) 2009
      • Libya GNI 2009
      • Sudan and South Sudan: I split Sudan’s CO2 evenly as South Sudan has no data in this category (probably due to it being a relatively new country)
      • Djibouti GNI (country & per capita) 2005
      • Somalia / Somaliland GNI (both) 1990
      • Iran (both GNI) 2009
  1. In future iterations the following features would be added:
    • Animate the circles when switching visualizations.
    • Add an ability to turn on/off the logarithm.
    • Highlight all circles by region when hovering over a region in the legend.
    • Improve the typography.
    • Add a label for the toggle switch.
    • Show a projection of the trend into the future
    • Toggle a view to display the data on a world map.

Code is available on Github.

Advertisements

Screen Shot 2013-09-16 at 8.34.55 PM

link to interactive map

The inspiration for this map came from Rebecca Solnit’s Infinite City; an illustrative atlas that features a collection of maps and writings about San Francisco. One of my favorite maps in the book is titled; “Poison/Palate”. The map shows locations of sites designated as either a ‘palate’, ‘poison’, ‘poison/palate’ or ‘EPA Superfund.’  Farmers markets, organic farms and well known eateries are juxtaposed next to nuclear research laboratories, chemical plants, and Silicon Valley’s legacy of tech waste.  The map I made shows only locations of NYC’s farmers markets linked to their nearest superfund site (within city limits, their are many more just outside in other counties and New Jersey). With more time I would include other types of ‘palate’ and ‘poison’ sites such as notable NYC eateries and restaurants.  Further user interaction with the data would developed as well; such as the ability to search all sites from a certain distance of an address entered by the user.

Data was obtained from the EPA and NY State open-data.  CartoDB is being used to host and render the data live; a PostGIS SQL query links the two data-sets and a subtle light-grey base-map I imported from MapBox puts the data in context.  To process the data I used GDAL’s ogr2ogr utility to query out the 5 boroughs of farmers markets from all of NY.  I used QGIS to perform a spatial query on the EPA data to only select sites within the NYC boroughs (this could probably be done with ogr2ogr but I’m not certain).

*note: when searching for superfund data the EPA Environmental Dataset Gateway is a good place to start. Superfund sites are also known as “Comprehensive Environmental Response, Compensation, and Liability Act (CERCLA)”, a term that comes up a lot in the EPA websites.

**the list of NYC farmers markets I used appears incomplete. The dataset I originally downloaded from nyc open-data contains more records but the address data is not easily geocodable; there are many addresses like: “Crotona Park South & Clinton Ave, in Crotona Park” instead of the typical format of street address, city, state, zipcode. Given more time I could have reconciled the state and city datasets and included more market locations.