Category Archives: Maps

37Bill Grid data — average miles traveled per day per household

MAPC provided a shape file dividing the state on a 250m grid (~15 acre blocks). The file is pretty unwieldy, dividing the state into 355,728 segments, for which only 67,919 have data (less than 20% of the grids have a metric for “mipdaybest” which I used to filter the data). Here is a picture of the state with only the grids with data colored:

PopData

To accomplish this rudimentary image I converted the provided shape file to GeoJSON in python using the pyshp library, filtering out records with no data.  This code is based on an example by M. Laloux, modified to use an iterator over the records and drop any with no data (here I am using “mipday_phh” as a proxy for no data).

import shapefile
import itertools
import sys
sourceName = sys.argv[1] #"../data/grid_250m_attr.shp"
destName = sys.argv[2]   #"data/grid_250m_attr_removedEmpty.json"

# read the shapefile
reader = shapefile.Reader(sourceName)
fields = reader.fields[1:]
field_names = [field[0] for field in fields]

mipday_phh = field_names.index("mipday_phh")

buffer = []

for (sr, ss) in itertools.izip(reader.iterRecords(), reader.iterShapes()):

    if sr[mipday_phh] == 0.0:
        #print "skipping", sr
        continue
    atr = dict(zip(field_names, sr))
    geom = ss.__geo_interface__
    buffer.append(dict(type="Feature", geometry=geom, properties=atr)) 

    # write the GeoJSON file
from json import dumps
geojson = open(destName, "w")
geojson.write(dumps({"type": "FeatureCollection","features": buffer}, indent=2) + "\n")
geojson.close()

I then opened the file in QGIS (Why did I convert it to GeoJSON you ask? Because I am planning to work with it in d3 next). With QGIS I can’t see the grid color by default because it is overwhelmed by the default black border on each shape. This post discusses how to remove the outline. I ended up using the last option (the old Symbology), but what a pain!

Besides the difficulty of working with such large files (did I mention with all the yak shaving I am doing here I got a bigger hard drive too?). The grid data is also awkward to link to other data sets, which more typically use zip code information. (Update: The companion data set Tabular section includes zip code information for the centroid of each grid cell). Since the grids are regular, many span multiple zip codes/municipalities. Furthermore, addresses were mapped to grid locations based on estimates of street location, so for some rural areas some data was mapped to the wrong grids etc. Regardless of those challenges, the pictures below show clearly that driving patterns vary greatly within a city or zip code.

Here are a few plots looking at the data in QGIS showing Miles Per Day per Household, where the blues are under 35 miles per day, yellow around 75 and orange is over 100 miles per day. It’s a horrible viz, but I can’t bring myself to spend time on the color config in QGIS (way too painful). Note the municipality boundaries shown are drawn from a separate shapefile layer provided by MAPC.

mipd_phh

The same image is shown below zoomed in on the Boston area. There are unsurprising large swaths of long distance car commuter communities ringing the city. I included a legend here FWIW, the colors were picked manually from the QGIS color picker (which leaves a bit to be desired), hence the abysmal scheme. The divisions were automatic using the equal bin option. This data is clearly flawed! We can see from the legend that there is at least one grid cell with an average of over 6 thousand miles a day! I screen grabbed this legend from the layers toolbar in QGIS and gimp-shopped it in for your viewing pleasure.

MilesPerDayPerHouseholdBostonArea

Next up, working on a workflow for rendering nice images and figuring out what data and stories to tell.

Data used in this post was provided by Vehicle Census of Massachusetts, Metropolitan Area Planning Council 2014 and licensed under a Creative Commons Attribution 4.0 License.

Interactive Choropleth Map of Morphine Access around the world.

Continuing to explore the GAPRI data on access to pain medication around the world, I decided to try a choropleth map (with country colors now corresponding to morphine/death). Check out the working interactive choropleth map.

screenshot of cloropleth map of morphine access around the world

I got to use a few new tools in getting this to work:

  1. In order to use the d3.geo.path which converts GeoJSON to SVG for display in a web page, I needed first to covert my shapefile with the GAPRI data into a GeoJSON format. For future use I will probably explore ogr2ogr to do this transformation. But for my initial test I used a web based MyGeodata Converter, which worked like a charm.
  2. d3 Winkel Tripel Projection, this projection is in the geo.projection d3-plugin. I had some trouble getting it to work, I think because parts of this are in transition with d3.v3.  I ended up using versions based on this example.
  3. d3 HsL color interpolation: Pretty simple. You first need to scale your data from 0-1, then pass that into the color interpolator. I found the docs a little confusing on this point, so here is an example:
    // normalize your data
    var normScale = d3.scale.linear()
        .domain([0, _(data).max()])
        .range([0,1]);
    // create the color interpolator
    var interpColor = d3.interpolateHsl('#EC8076','#84BB68');
    // later get the color corresponding to your data
    color = interpColor(normScale(d))
  4. Mouseover Highlight: To highlight the country by color on mouse over used:
    .on("mouseover", function(e){d3.select(this).style("fill", "steelblue")})
    .on("mouseout", function(d){d3.select(this).style("fill", MapColors(d))})
  5. Tooltips: I used svg:title for the tooltips. Super simple, just append(“svg:title”) to each path and set the .text() to what you want to display. I spent more time handling the special case of “No data”.
  6. d3 Zoom Behavior: The zoom behavior leaves something to be desired, but it was simple enough to get the d3.behavior.zoom() to work based on this example, the important code is svg.call() and the redraw() function.

Todo:

  1. Connect the table legend and the map interactively.
  2. Use a better zoom functionality that is more discoverable (zoom buttons and grab icon for panning).

Using Mapnik to render custom colored Map images

Say I want my output map of the world to be colored with a customer specified set of specific colors for countries, water etc.  I also want to output a specific resolution for publication.

Map of World rendered in Mapnik

Natural Earth shapfiles have a the built in “mapcolor” attribute to group countries appropriately for this purpose, that will be helpful!

Q-GIS has some nice functionality for playing around with the shapefile and exploring the data. But for a final output image it doesn’t meet the following requirements:

  • Controlling output image resolution and cropping (you can save to .png or .jpg, but there are no other options regarding resolution or dimensions to save).
  • Configuring colors and color ranges to use can be done (quick and dirty demonstration that the mapcolors attribute is what I expect is easy). But using Q-GIS quickly becomes a super manual process, you have to specify colors via separate RGB fields, ugh. You can specify an attribute to use for the color categories, but seems hard in that mode to configure specific colors to use.
  • Must reload modified shapefiles and re-configure them anytime the source file changes.

I don’t doubt that there are plugins for saving files (say if you want a vector image!!) and I could probably create my own symbology configuration for the colors I want, but since I am only catering to myself and don’t want to become an expert in some RSI inducing graphical interface… time to find a scriptable way to generate high quality output images.

Mapnik is a map renderer that has bindings for lots of languages including python. I started with Getting started with mapnik in python, which walks through a simple example in no time. First problem solved: simply specify the image size when you declare the map:

m = mapnik.Map(600,300)

And specify the area you want to display when you save it:

m.zoom_to_box(mapnik.Box2d(-180, -90, 180, 90)) # or m.zoom_all()  
mapnik.render_to_file(m,'world.png', 'png')

Unfortunately, coloring the country groups is not trivial. Mapnik doesn’t current have the ability to set the PolygonSymbolizer’s fill color based on a shapefile attribute (feature in the works). However, as I am now an expert at manipulating shapefiles with pyshp, I hacked together an ugly but straightforward solution in a few minutes.

  1. Using pyshp, read in the shapefile to render and spit out n new files, one for each mapcolor group. See split_mapColors.py
  2. Create a unique mapnik style for each country color group and add each new shapefile to the map with the appropriate style layer. See renderCommon.py
Putting it all together, I used the script renderBaseline.py and other utility functions to generate the colored map of the world above. 

Using the same techniques morphed shapefiles can be rendered in the same color pallet, change your mind about  the colors? No problem.

Generating a cartogram of access to pain medication around the world.

Okay! Now that I have my data in a shiny new shapefile. Time to make some cartograms using ScapeToad.

The Data:  Morphine per Death as reported by the Global Access to Pain Relief Initiative (GAPRI)

I am working on this project with Kim Ducharme for WGBH, The World, for a series on cancer and access to pain medication, which highlights vast disparities in access between the developed and developing world. Below is a snippet of the data obtained from GAPRI, showing the top 15 countries for amount of morphine available/used per death by Cancer and HIV and the bottom 15 for which there were data.

                                   Country   mg Morphine/ Death
                            --------------   -------------------
                             United States   348591.575146
                                    Canada   332039.707309
                               Switzerland   203241.090828
                                   Austria   180917.511535
                                 Australia   177114.495731
                                   Denmark   160465.864664
                Iran (Islamic Republic Of)   149739.130818
                                   Germany   144303.589803
                                   Ireland   140837.280443
                                 Mauritius   121212.701934
                            United Kingdom   118557.183885
                                     Spain   116480.684253
                               New Zealand   112898.731957
                                   Belgium   108881.848319
                                    Norway   106706.195632

And the 15 countries with the least amount of morphine access:

                                   Country   mg Morphine/ Death
                            --------------   -------------------
                                   Burundi       38.261986
                                  Zimbabwe       34.508702
                                     Niger       31.359717
                                    Angola       30.485112
                                   Lesotho       25.998371
                                  Ethiopia       25.323131
                                      Mali       24.713729
                                    Rwanda       23.269946
                                  Cameroon       15.162560
                                      Chad       10.866740
                             Côte D'Ivoire        9.723552
                                  Botswana        9.352994
                                   Nigeria        8.780894
                              Sierra Leone        8.546830
                              Burkina Faso        7.885819

Traditional Cartogram

Based on these numbers of morphine/death, in a basic cartogram where each country’s area becomes proportional to the metric, Switzerland would be 60% of the size of the US. But wait… this wasn’t what I expected, gosh that’s ugly and hard to read… And so starts the cartogram study and tweaking experiment. Is there a perfect solution?

Morphine per Death (as Mass)

Note the countries of Europe are too constrained to get to their desired sizes, so there is always some error in these images. Regardless of that there are two issues: 1) Europe/Africa/Asia are so badly distorted as become nearly unreadable, and bring the emphasis to a fish-eye view of Europe with weird France and Switzerland shapes. 2) This seems to make the whole story about Europe, de-emphasizing the US and Canada, which have higher usage than any of the European countries and also taking the focus away from shrunken Africa/Asia and South America.

This seems to be the best that the diffusion based contiguous cartogram is going to be able to do for this data set. ScapeToad has some options for mesh size and algorithm iterations, none of which seem to significantly effect the output image in this case. The other option is to take your metric and apply it as a “Mass” (as above) or as a “Density” to each shape. ScapeToad explains what the Mass/Density distinction is pretty well:

In our case Morphine/death is a “Mass/Mass” ratio which is also a “Mass”. However, for kicks I ran the “Density” option which is technically wrong (scales the area of each country based on the metric, instead of making the area proportional to the metric as a traditional cartogram should). Low and behold, the density image is certainly more satisfying and seems to tell a better story, although over-emphasizing the role of the US, Canada and Australia, which all dwarf Europe:

Morphine per Death (as Density)

Well, this is a quandary, the “correct” image is too confusing to be useful and takes the focus away from the story about the developing world and into what-the-?-is-this-distorted-picture land. But the “density” image is not “correct”.

From here I spent some time trying to generate a less distorted mass based cartogram. By running the cartogram generation on each continent separately I generated much less distorted images of Europe and Africa (Asia still needs some work). Shown here are the raw outputs for these regions in green, purple and pale blue respectively.

Morphine per Death (as Mass) by region, unscaled

To piece the cartogram back together the continents needed to be scaled and translated to the correct locations. Here is how far I got in that process. Europe is much easier to read and Africa is a huge improvement. Asia/the Middle East are still quite confusing, potential for improvement breaking this into more chunks, but it was becoming a more and more manual process and the output image still isn’t “satisfying”.

Morphine per Death (as Mass) each region calculated separately, then scaled appropriately to maintain more recognizable shapes

Does this cartogram tell the story we want? Does it really make sense to honor country borders and make small countries as large as big countries that have the same morphine/death value? For example, all things remaining equal, if the German and French speaking parts of Switzerland split into two new countries, given the same morphine/death number should each of the two halves have a cartogram area equal to previous Switzerland, effectively doubling the size because of  a political change? That doesn’t make much sense, but that would be considered a technically “correct” cartogram measure. It seems to me in some ways scaling the area is more correct as in the Morphine/Death as Density image, as it doesn’t exaggerate small countries with smaller populations…

In the end it is possible to generate a lot of different cartogram images. Some of which are suggestive of the story you want to tell, none of which are easily deciphered to provide actual data numbers. Keeping in mind that a cartogram isn’t a tool for communicating precise data measures, I think pick the one you like that makes sense to you vis-á-vis your data and the story you want to tell, don’t overstate the accuracy of the image, and provide other means to get at the actual numbers. For example, I created an alternate view in this choropleth map of the same data.

UPDATE 2012/12/03:

The final image is published now as part of PRI’s The World new series on Cancer’s New Battleground — The Developing World.

Access to Pain Medication around the World

Adding custom data to a shapefile using pyshp 1.1.4

As part of a cartogram generating project I need to get data from a .xls file into a shapefile for use with ScapeToad. Reading the excel file is easy with xlrd. Shapefiles? that is new territory.

Shapefile resources:

ScapeToad requires a high quality shapefile as input. I tested two that worked fine, both include population data suitable for testing:

  1. Thematic Mapping: http://thematicmapping.org/downloads/world_borders.php TM_WORLD_BORDERS_SIMPL-0.3.zip. Which is licensed under the Create Commons Attribution-Share Alike License. This license is not appropriate for commercial use and the author didn’t respond to my question regarding other licensing options.
  2. Natural Earth: On a tip from Andy Woodruff, I switched to using Natural Earth shapefiles which are licensed completely in the public domain, suitable for anything you want! Note, I discovered that the most zoomed out Natural Earth file, “ne_110m”, didn’t have shapes for some of the smaller island countries in my data set, so switched to using the “ne_50m” versions which included everything I needed.
Next step, getting custom data into the shapefile.

Using pyshp to add data to a shapefile

Since I do most of my data processing in python anymore, I was happy to find a python module for read/write/editing shapefiles. Unfortunately, pyshp.py is not the best maintained module. I used pyshp version 1.1.4, because it was packaged in ubuntu. After discovering a number of bugs I realized they have already been reported but nothing significant seems to have been fixed in 1.1.6. So I will just document the workarounds I used here.

1st pyshp 1.1.4 workaround: Renamed the shapefiles to remove any dots in the file name (the 0.3 in the case of thematic mapping shapefiles) because pyshp can’t handle extra dots in the file name.

This is kinda a nuisance since there are 4 files in a “shapefile”. This command will rename the extensions “dbf”, “prj”, “shp” and “shx” all at once:

 for a in dbf prj shp shx;do mv TM_WORLD_BORDERS-0.3.$a TM_WORLD_BORDERS_dot_free.$a;done

2nd pyshp 1.1.4 workaround: Massage numeric data you are adding to a record to have the correct precision.

My whole reason for using pyshp is to add data from excel into the shapefile. This means adding fields to identify the data and then adding the data to the record for each shape. The format of the new attributes (a.k.a. fields) is well described here. In my case I want to add numbers for example: sh.field(‘MY_DATA’, ‘N’, 6, 3). The number args are width and precision, where width is the total number of characters to represent the number and precision is the number of characters after the decimal. The above (6,3) can encode: -1.234 and 98.765.

Note, pyshp will error (AssertionError assert len(value) == size) if you put data into the record with greater precision than specified (it will not truncate for you).  I used a simple hack below to get a precision of 3 for my data:

    def precision3(n):
        ''' Force argument to precision of 3'''
        return float('%0.3f'%n)

3rd pyshp 1.1.4 workaround: When adding a new data field, pad all the records with default data for the new attribute.

pyshp assumes when saving the file that the data is perfectly formatted, but doesn’t help too much when adding or deleting data. Records are stored in python as a list of lists, when the shapefile is written pyshp assumes that the records lengths equal the number of fields (as they should be). But it is your job to make sure this is true (if not the records will wrap around and become non-sense). Q-GIS is useful for inspecting shapefile attribute tables to discover issues and verify that your new shapefile works in an independent reader.

In my case data wasn’t available for all countries, so I padded with a default value (appended to the end of all records when adding the field) and then looped through and put the correct data in the records for which data was available.

Example here adding a new field for Numeric data and default data to all records. All my data is non-negative, so magic number “1” is for the decimal point.

    def addField(name, widthMinusPrecision, precision = 3, default = 0): 
        sf.field(name, 'N', widthMinusPrecision+precision+1, precision)
        # add default data all the way down to keep the shape file "in shape"
        for r in sf.records:
            r.append(default)
        return

4th pyshp 1.1.4 workaround: delete() method doesn’t work, don’t use it.

Each shape is described by two pieces of data, linked together based on their index. When deleting a shape, both the record (with the meta data) and the shape (with coordinates etc) must be removed. If only one is deleted pyshp will add a dummy entry at the end and many of your records and shapes won’t line up anymore. To delete a shape, you must delete both the shape and the corresponding record. The delete method doesn’t do this, don’t use it, do it yourself:

    def deleteShape(n):
        del sf._shapes[n]
        del sf.records[n]
        return

5th pyshp 1.1.4 workaround: Handle non-ascii character encodings yourself

pyshp doesn’t declare a character encoding when reading files, so they default to “ascii”. If you are using the Natural Earth shapefiles they have non-ascii characters and are encoded in Windows-1252. (See previous post for more info about the Natural Earth encoding.) I worked around this by looping over the records and encoding all strings to unicode:

    for r in sf.records:
        for i,f in enumerate(r):
            if isinstance(f, str):
                r[i] = unicode(f,'cp1252')

And then reversed this before saving the file via:

    for r in sf.records:
        for i,f in enumerate(r):
            if isinstance(f, unicode):
                r[i] = f.encode('cp1252')

6th pyshp 1.1.4 workaround: When looking at sf.fields adjust the index by one to ignore ‘DeletionFlag’

pyshp adds an imaginary field for internal state tracking to the beginning of the fields list. If you are looking up field names in this list to find indexes, you should correct your indexes accordingly, there is not actually a field called ‘DeletionFlag’.

Conclusion:

After working around these bugs and massaging my country names to map from .xls to the names in the shapefile (17 cases of “Bolivia (Plurinational State Of)” == “Bolivia”) , I was able to use pyshp to generate a new shapefile with my data in it! Next up, cartogram-orama.

Cartogram Basics

I am working on a Cartogram of the World with my friend Kim Ducharme. We are looking for something dramatic to show disparity between affluent countries and developing countries and a cartogram seems a great way to hit the message home. This is my first foray in to the world of cartograms, so here is some useful background.

A cartogram is a map where the areas of regions have been adjusted to represent some other metric of interest. They are intentionally distorted “maps” and yes there is controversy over them :).

Cartograms come in a few different flavors (see indiemaps summary):

  1. Non-contiguous Cartograms: Each object (state/country/etc) grows or shrinks independently of its neighbors. With the result being perfectly accurate, but the original map is filled with white space. Excellent history of Non-contiguous cartograms here.
  2. Dolring Cartograms: Replace the regions with circles or squares I won’t discuss these more.
  3. Contiguous Cartograms: Attempts to keep boundaries connected and distorts the shapes (often grossly) attempting to scale the country areas according to some metric. The canonical approach seems to be the Gastner/Newman diffusion method.
Our preliminary investigation showed that the Non-contiguous Cartogram was not a very satisfying image. Too much white space and just doesn’t have the punch of a wildly distorted contiguous cartogram.
Here are is an example of the diffusion method image generated by Mark Newman, this one represents Total spending on Healthcare, check out Mark Newman’s site for more in this family:

Having decided to pursue generating a contiguous area cartogram, first step was to find out how.

Selecting a Cartogram program:

There are a few options I found for generating a contiguous area cartogram:

  1. Download the source for Gastner/Newman’s cart program, compile and run it. Looks doable, but would need to massage my data into some grid format. Maybe someone else has made it easier…
  2. Apparently ArcGIS has a Cartogram Geoprocessing Tool based on the Gastner/Newman method too. But I don’t have a budget for fancy commercial software.
  3. ScapeToad: At last, a ready to go cartogram generating program also based on the Gastner/Newman method. This one has a GUI and is released under the GPL. Perfect!

Using ScapeToad to generate a cartogram:

Using ScapeToad is easy! It has simple instructions and I only ran it a few issues easy to work around. It uses ESRI shapefiles and will output the updated image as an ESRI shapefile (with error  data added in). I also found the ScapeToad documentation pretty helpful. The only annoying issue  on my system was the recently used files selection silently didn’t work. So when opening a shapefile (via “Add Layer…”) I had to always browse to the correct location. I will discuss about the “Mass” and “Density” options in another post. Otherwise there is not much to say, I will let ScapeToad speak for itself.

ScapeToad requires the shape file to have “perfect contiguity”, so find a suitable shapefile and test it before moving on. As discussed in a previous post, the Natural Earth shapefiles are now my go to. These files conveniently have some population data you can use to test the ScapeToad is working.

More on the real challenges, adding your own data to a shapefile, coming up…

Reprojecting maps with QGIS

I figured out how to reproject maps with Q-GIS and would like to celebrate with this lovely image, a US Atlas Equal Area Projection, which I thought was the most fun of the built in projections in Q-GIS 1.8.0:

If the Q-GIS documentation site is working it isn’t too hard to figure out how to reproject. Quick summary:

  • File >> Project Properties: You can set the project Coordinate Reference System (CRS), if you “Enable ‘on the fly’ CRS transformation” you can play around with the different projection options.
  • To set the CRS on a shapefile layer, right-click on the layer name and select “Set Layer CRS”, this is particularly useful if you don’t have ‘on the fly’ enabled, then your layers will not display if they are in a different CRS than the current project one, in which case you can change them here so they will display.

Note for practical equal area projections I found the Mollweide options seemed to work well, but don’t understand yet the difference between world and sphere options. Anyone?

Funny, it wasn’t untill I started looking at these equal area projections that I realized how big Russia is!

A better simple map

I am learning to use pyshp.py (a pretty buggy but functional python module for reading and writing ESRI shapefiles) and Quantum GIS. As a quick demonstration I replotted the data from an earlier map. Q-GIS makes it pretty easy to adjust the appearance. The world map shapefile is from Natural Earth.

This time I generated a shapefile from python directly, super easy (I will highlight problems with pyshp in a future post, but creating this simple file worked fine, although doesn’t seem to define a CRS, I am pretty sure it is WGS 84). Here is the code:

import shapefile
w = shapefile.Writer(shapefile.POINT)
max_len = max([len(s.name) for s in stations])
w.field(‘NAME’,’C’,’%i’%max_len)
for s in stations:
    w.record(s.name)
    w.point(s.lon, s.lat)

w.save(‘stations’)

Mapnik experiments

After a day of working with mapnik, I have my first supremely ugly plot, which shows Hubway station locations around Boston, the colors are by station name prefix, which is roughly by area, but not that logical outside Cambridge and Somerville…

Ha! Decided to try to add some color to these water bodies myself in the sytlesheet. Ended up coloring all of the following:

<Filter>[natural] = ‘water’ or [natural] = ‘lake’ or [natural] = ‘bay’ or [natural] = ‘wetland’ or [natural] = ‘marsh’ or [gnis:feature_type] = ‘Bay’ or [landuse] = ‘reservoir’ or [landuse] = ‘basin’ or [waterway] ‘canal’ or [waterway] = ‘boatyard’ or [wetland] = ‘wet_meadow’ or [wetland] = ‘tidalflat’ or [wetland] = ‘saltmarsh’ or [wetland] = ‘swamp’
</Filter>

Which still didn’t manage to get any of the Boston Harbor or the Mystic River, however, I am tabling it for a while.

Putting data on a map

Here is my first plot of some data on a map. This happens to be the locations of seismic monitoring stations around the world.

I used a equirectangular projection of the world map from wikipedia, which means that lat/long coordinates map easily to pixel coordinates. Also used the python csv library to read the data and the python imaging library to modify the image with red dots for monitoring station locations.