Innovation and monopolies

Is there a link between innovation and being a monopoly? Without data this is only going to be hypothesizing, but here goes.

Quite the contrary, I would argue, small firms are more incentivized to innovate to gain reputation, whereas monopolies tend to cash in on it.

Big, long-term plans can certainly be fulfilled by collaboration between many companies and/or agents, because at each step in the way, more value is added.

Take the story of a pencil:
  • Someone mines and sells graphite for profit.
  • Someone cuts timber for a profit.
  • Someone extrudes graphite lead for a profit.
  • Someone shapes pencil wood encasings for a profit.
  • Someone puts the lead in the wood and sells it to you for profit.

No "grand" planning was needed. Perhaps graphite lead is also used directly by artists, or maybe new kinds of mechanical pencils appeared, reducing demand for wood encasings in favor of plastic/metal. Everyone acting at every level has the incentive to innovate, and perhaps are better suited to it, since they can focus on their own job. Monopolies rely on managers instead.

The same can happen for lots of things. Did you make something you don't know what to do with? Sell it to whoever buys it. The coal miner doesn't know or care whether his coal will become a pencil or electricity. He just cares that he's making a profit - aka, his work is deemed useful and encouraged by his clients.
Did you research something that can offer competitive advantage? Become a consultant. Did you notice something other people find hard to do but you can do easily? Start doing it and selling it to them.

A monopoly is often blind to things like these, not noticing that people work on useless things (Google employees manually editing maps, for example (I just invented this, it isn't true)). The market would quickly replace them with people who can do it more cheaply/efficiently (say, freelancers instead of using $150k/year employees). The market is much more modular than a monopoly.

Sure, Google for instance innovates. But does, say, Comcast innovate? Comcast just uses lobbyists to make sure people can't compete with them. The only innovation is in convincing the government to make laws in its favor and in manipulating the masses.

And that's the only advantage of a monopoly - being able to deal with red tape. Because you need large groups of people to become a majority, and lots of money to lobby in favor of your laws - laws against potential threats to your insane profit margins, laws friendly to the "competitors" on whose turf you secretly agree not to step.

Some monopolies tend to have a global reach (like Google or Coca-Cola), which means they do well irrespective of laws - perhaps they're better.
Others (like Comcast or Electrica) do well strictly in their own country, probably because they control a large amount of what is going on in it (like owning the media and manipulating people, or having relations with certain well-placed officials).

This is my brain dump. It is by no means final, just my belief so far. Please attack it! :)


Country statistics

When doing various analyses over country statistics, I've ran across a few problems:

  • The name of the same country differs across various sources
  • Some countries may be missing
I had to match them manually in my spreadsheet, and it was boring.

Since I plan doing that thing quite often, I created a tool. I ran it several times with data from various sources, and collected several names for some countries, say, Falkland Islands:

  • Falkland Islands
  • Falkland Islands (Malvinas)
  • Islas Malvinas
  • Malvinas
For others, I made compromises, because they're disputed, ambiguous and/or included in one another. For example, I've associated all of the following with the code PS (for Palestine):
  • Palestine
  • Palestinian Territory
  • Gaza
  • West Bank
  • West Bank and Gaza
While this is a pretty simple script, you can save time by using it. And if you wish to contribute (even small additions to country_codes.csv), send me a pull request.

Here's something that resulted from combining 2012 data from the World Bank, current USD prices of a combo meal from Numbeo, and population data from Wikipedia. I just had to arrange the data in CSVs with rows of the form country, value, and the script took care of merging them.

Norway is right next to Venezuela, so I guess this doesn't really say much about countries. Or does it?

I pasted the result into Google Drive, and made this pretty bubble chart. Knock yourselves out!


Letter to EFF on Net Neutrality

I sent a letter to the Electronic Frontier Foundation, in response to their campaign for net neutrality:

As an independent observer, I'm delighted that you do a great job about informing people of what ISPs and big companies come up with in order to get more money. However, I have a few comments about the reasons for which you want to stop this:
  1. If ISP companies wish to grant special privileges to certain players, I think they should have the freedom to do so - because they own their routers and infrastructure, and should be able to do whatever they please with them. Legislating for net neutrality is an attack on their freedom.
  2. Perhaps allowing this type of contracts will make it cheaper for typical consumers to use mass/high-traffic sites (like the ones provided by those who can pay for privileges).
I think you should implement and fight for another kind of solution: developing a crowd-sourced ISP monitoring browser plugin, for example, and aggregating and publishing the extent to which ISPs discriminate their traffic. This way, consumers are aware and able to make a conscious decision, and the market will find a balance which will please the people the most.
However, a fundamentally more important problem in the US (as I see it), is the over-regulation of telecom, leading to few alternatives and only a handful of ISPs holding monopoly (just like the tobacco industry). You should work to reduce regulation, to enable more players to enter the market more easily, instead of increasing it (for example by giving them an extra form to fill - the net neutrality one).


Cluj map of time to Piața Unirii

I'm given the task of buying an apartment in Cluj-Napoca. My parents want to buy me one, and I greatly appreciate this, so I want to make an informed decision.

I want it to be close to the city center. However, the ones that are close are either too expensive, or in disrepair. I don't own a car, nor do I intend to in the near future, and I want to commute by bus (or perhaps by bike).

Thankfully, Google introduced public transit directions in Cluj-Napoca recently, and it's been indispensable to all freshmen who came to study in the student city.

I used it too, more exactly, its API, and you can see how in this post, full of juicy technical details.

Here's the map I obtained:

I noticed that one can get to Mănăștur faster than I thought. I was probably more familiar with the east side of the city, and that's why it seemed to take longer. But I suppose it having a lot of bus stations and few intersections to be the reason why it's quick to get downtown.

Wonders of Google Maps API

As you can see, we have new Hyperbuses which must have been difficult to map, with all their jumps between stations, so we understand Google's delays ;)

All jokes aside, the transit data is still very valuable, even if not perfect. I realized I could use these directions, through the Google Maps API Web Services, to find out how long the commute would take. Also, I felt like procrastinating on the final paper that I must do for college.

So, as I read through the documentation, I come up with the following query:

This returns a JSON of the available buses from my current location (the Economica 2 dorm) to my workplace. The arrival time is a random Tuesday at 9AM (EEST), and mode is set to "transit", to get transit directions, instead of ones for a personal vehicle.

Among the plethora of data returned, I find the most important piece:
"duration" : {
    "text" : "31 mins",
    "value" : 1883

That's exactly what I'm looking for. The text is 31 minutes, and the value looks like it's the duration in seconds, which is even more precise (though not necessarily more accurate): 31.383 minutes.

So, how to automate all this?

First, I check with a completely browser-unrelated tool that I don't need any login cookies set or anything.

wget "https://maps.googleapis.com/...

And it still works. Well, it seems it's very easy to use, and freely available to anyone. The usage limits say the limit is 2,500 directions requests per 24 hour period. That should be about enough for creating a tool which generates, say, 50x50 pixel map of duration via bus, in one day. But I could easily create a more detailed map, given more time.

This should be easy to implement using Python and matplotlib.

Getting data for one point

The first step is getting the distance from one point to my workplace using Python. This is done very easily, since Python comes pre-packaged with lots of stuff, such as urllib2 for fetching a resource at a URL, and json, for parsing, you guessed it, JSON data. Here's a Python snippet:

import urllib2
import json

response = urllib2.urlopen('http://maps.googleapis.com/maps/api/directions/json?origin=46.774699,23.621780&destination=46.76817,23.579759&sensor=false&mode=transit&arrival_time=1399356000')

data = json.load(response.read())

I paste the JSON into a collapsible viewer, JsonFormatter, so I can more easily understand its structure. The data I'm looking for, in my particular query, seems to be data['routes'][0]['legs'][0]['duration']['value'] .

I'm worried because there are a lot of 0's in my path, because those elements happened to be the first in a list. But what if there are more possible ways of reaching my destination? Are they sorted? In what order?

Thankfully, Google's documentation provides these details, and says that there is a leg for every destination or waypoint in my trip. Since I only have one destination and no waypoints, I don't need to worry, and my query is always going to be the one above. I get the minimum time, however, if you look at the code on GitHub.

Which points to get?

If you study what techniques geographers use when mapping, you'll find one called kriging. This is what an oil surveyor would use, to determine the points which would offer the most information. However, I'm a simple man, and the Google API allows me to bruteforce it, so I'll use simpler (haha, just go on reading and find out) maths.

I could use Google's API to determine if a point is in Cluj-Napoca. But I prefer a simpler way, which abuses their servers less. Here's where linear algebra comes into play. If you look at Cluj-Napoca, it kind of looks like a parallelogram:

If you read the following Stack Overflow response, a more mathematically inclined peer enlightens us. We're going to use part A of his answer. His words are as follows:
p = A + u*AB + v*AD 
This equation is able to generate any point in a parallelogram defined by A, B, and D, by tuning u and v from 0 to 1.

So, we choose u's and v's from 0 to 1, to cover our map uniformly. Actually, it's only going to be really uniform if the lengths of the sides are quite similar, otherwise, the points are going to be denser over one axis. To compensate, one would multiply the number of points on the longer axis by the ratio of the axes' lengths. But I don't bother. I just pick 11 points on each side, since the lengths are approximately equal.

Using numpy's linspace function, you can easily create a grid of n points - pairs of u's and v's covering the rectangle (0,0)-(1,0)-(1,1)-(0,1) uniformly. Here's how:
us = numpy.linspace(0, 1, int(numpy.sqrt(n)))
vs = us

for u in us:
    for v in vs:
        point = numpy.array([u, v])
(Edit: after this, I found out about numpy.mgrid. Facepalm.)

Actually, we plop those points into the equation, and we get our coordinates:
ab = b-a
ad = d-a

point = a + (u*ab) + (v*ad)

Fancy some style?

When we have all points, we group together their coordinates (into lists which I name X, Y, Z), and pass them to an interpolation function, to force them into a grid for creating a pretty contour plot.

xi = numpy.linspace(min(X), max(X), 100)
yi = numpy.linspace(min(Y), max(Y), 100)
zi = griddata(X,Y,Z,xi,yi,interp='nn')
CS = plt.contourf(xi,yi,zi,15,cmap=plt.cm.rainbow,
                  vmax=abs(zi).max(), vmin=-abs(zi).max())
I go on to make sure that no datapoint is ever lost, by using a Python dictionary to record them uniquely, and by loading and saving them to a file.

Also, I overlaid this map over Google's, so we can find out what exactly we're looking at. I tried plotting a few dozen times. First to get the coordinates right, and then to try out various styles.

I reached Google's query limit - but I want to thank them for offering any amount of this service for free. It's like a taxi worker losing a client because of telling you where something is, instead of taking you himself.

I ended up using complete transparency when the commute is quickest, and completely opaque purple when it's slowest, because purple is the least used in Google Maps. Also, I slightly darkened the datapoints' positions.

I'm proud of the result (click to see a larger view):

You can download the source code and play with it. Feel free to send me a pull request if you tweak it into something even more awesome.