2014-04-21

Cluj map of time to Piața Unirii

I'm given the task of buying an apartment in Cluj-Napoca. My parents want to buy me one, and I greatly appreciate this, so I want to make an informed decision.

I want it to be close to the city center. However, the ones that are close are either too expensive, or in disrepair. I don't own a car, nor do I intend to in the near future, and I want to commute by bus (or perhaps by bike).

Thankfully, Google introduced public transit directions in Cluj-Napoca recently, and it's been indispensable to all freshmen who came to study in the student city.

I used it too, more exactly, its API, and you can see how in this post, full of juicy technical details.

Here's the map I obtained:



I noticed that one can get to Mănăștur faster than I thought. I was probably more familiar with the east side of the city, and that's why it seemed to take longer. But I suppose it having a lot of bus stations and few intersections to be the reason why it's quick to get downtown.

Wonders of Google Maps API


As you can see, we have new Hyperbuses which must have been difficult to map, with all their jumps between stations, so we understand Google's delays ;)

All jokes aside, the transit data is still very valuable, even if not perfect. I realized I could use these directions, through the Google Maps API Web Services, to find out how long the commute would take. Also, I felt like procrastinating on the final paper that I must do for college.

So, as I read through the documentation, I come up with the following query:

https://maps.googleapis.com/maps/api/directions/json
?origin=46.774699,23.621780
&destination=46.76817,23.579759
&sensor=false
&mode=transit&arrival_time=1399356000
This returns a JSON of the available buses from my current location (the Economica 2 dorm) to my workplace. The arrival time is a random Tuesday at 9AM (EEST), and mode is set to "transit", to get transit directions, instead of ones for a personal vehicle.

Among the plethora of data returned, I find the most important piece:
"duration" : {
    "text" : "31 mins",
    "value" : 1883
},

That's exactly what I'm looking for. The text is 31 minutes, and the value looks like it's the duration in seconds, which is even more precise (though not necessarily more accurate): 31.383 minutes.

So, how to automate all this?

First, I check with a completely browser-unrelated tool that I don't need any login cookies set or anything.

wget "https://maps.googleapis.com/...

And it still works. Well, it seems it's very easy to use, and freely available to anyone. The usage limits say the limit is 2,500 directions requests per 24 hour period. That should be about enough for creating a tool which generates, say, 50x50 pixel map of duration via bus, in one day. But I could easily create a more detailed map, given more time.

This should be easy to implement using Python and matplotlib.

Getting data for one point

The first step is getting the distance from one point to my workplace using Python. This is done very easily, since Python comes pre-packaged with lots of stuff, such as urllib2 for fetching a resource at a URL, and json, for parsing, you guessed it, JSON data. Here's a Python snippet:

import urllib2
import json

response = urllib2.urlopen('http://maps.googleapis.com/maps/api/directions/json?origin=46.774699,23.621780&destination=46.76817,23.579759&sensor=false&mode=transit&arrival_time=1399356000')

data = json.load(response.read())

I paste the JSON into a collapsible viewer, JsonFormatter, so I can more easily understand its structure. The data I'm looking for, in my particular query, seems to be data['routes'][0]['legs'][0]['duration']['value'] .

I'm worried because there are a lot of 0's in my path, because those elements happened to be the first in a list. But what if there are more possible ways of reaching my destination? Are they sorted? In what order?

Thankfully, Google's documentation provides these details, and says that there is a leg for every destination or waypoint in my trip. Since I only have one destination and no waypoints, I don't need to worry, and my query is always going to be the one above. I get the minimum time, however, if you look at the code on GitHub.

Which points to get?

If you study what techniques geographers use when mapping, you'll find one called kriging. This is what an oil surveyor would use, to determine the points which would offer the most information. However, I'm a simple man, and the Google API allows me to bruteforce it, so I'll use simpler (haha, just go on reading and find out) maths.

I could use Google's API to determine if a point is in Cluj-Napoca. But I prefer a simpler way, which abuses their servers less. Here's where linear algebra comes into play. If you look at Cluj-Napoca, it kind of looks like a parallelogram:



If you read the following Stack Overflow response, a more mathematically inclined peer enlightens us. We're going to use part A of his answer. His words are as follows:
p = A + u*AB + v*AD 
This equation is able to generate any point in a parallelogram defined by A, B, and D, by tuning u and v from 0 to 1.

So, we choose u's and v's from 0 to 1, to cover our map uniformly. Actually, it's only going to be really uniform if the lengths of the sides are quite similar, otherwise, the points are going to be denser over one axis. To compensate, one would multiply the number of points on the longer axis by the ratio of the axes' lengths. But I don't bother. I just pick 11 points on each side, since the lengths are approximately equal.

Using numpy's linspace function, you can easily create a grid of n points - pairs of u's and v's covering the rectangle (0,0)-(1,0)-(1,1)-(0,1) uniformly. Here's how:
us = numpy.linspace(0, 1, int(numpy.sqrt(n)))
vs = us

for u in us:
    for v in vs:
        point = numpy.array([u, v])
(Edit: after this, I found out about numpy.mgrid. Facepalm.)

Actually, we plop those points into the equation, and we get our coordinates:
ab = b-a
ad = d-a

...
point = a + (u*ab) + (v*ad)
 

Fancy some style?

When we have all points, we group together their coordinates (into lists which I name X, Y, Z), and pass them to an interpolation function, to force them into a grid for creating a pretty contour plot.

xi = numpy.linspace(min(X), max(X), 100)
yi = numpy.linspace(min(Y), max(Y), 100)
zi = griddata(X,Y,Z,xi,yi,interp='nn')
CS = plt.contourf(xi,yi,zi,15,cmap=plt.cm.rainbow,
                  vmax=abs(zi).max(), vmin=-abs(zi).max())
plt.show()
I go on to make sure that no datapoint is ever lost, by using a Python dictionary to record them uniquely, and by loading and saving them to a file.

Also, I overlaid this map over Google's, so we can find out what exactly we're looking at. I tried plotting a few dozen times. First to get the coordinates right, and then to try out various styles.

I reached Google's query limit - but I want to thank them for offering any amount of this service for free. It's like a taxi worker losing a client because of telling you where something is, instead of taking you himself.

I ended up using complete transparency when the commute is quickest, and completely opaque purple when it's slowest, because purple is the least used in Google Maps. Also, I slightly darkened the datapoints' positions.

I'm proud of the result (click to see a larger view):



You can download the source code and play with it. Feel free to send me a pull request if you tweak it into something even more awesome.

commute-duration-map