Autoencoder: 2012-05

2012-05-25

Will teaching soon be over?

MOOCs (Massive Online Open Courses) are sprouting up all over the Internet. While this seems like a good thing, since education becomes much more accessible, it's a double-edged sword, because it will have a great impact on society, and, inevitably, some people will have to find other jobs. Let me explain how that could happen.
So far, these courses seem very inoffensive, but sooner or later, some of them will become better and better recognized around industry. Also, they are very cheap to produce: new content only needs to be created once, or when the scientific understanding of reality changes. A lot of the same material can be re-used lots of times and can be broadcast for an ever-decreasing cost (think of cloud services, such as YouTube or Amazon's EC2).
This content also allows much greater interactivity - you can pause and rewind a lecture to better understand it, and you might keep track of your progress in understanding each of the concepts presented in the courses - and receive exercises and examples tailored by a machine specifically for you. Contrast this to an academic course, where you need to listen in a linear fashion, and you can't rewind if you missed something - perhaps losing track of what is being taught.
There are still some obstacles to overcome, such as verifying the credentials of a user, or gaining credibility of the assessments/exams, but if the courses truly manage to teach the needed information - and some will almost certainly manage to do it - then these problems are trivial, in my opinion. Some people disagree, however.
So, if high-quality education is available online for free (or almost for free), then schools and universities become redundant. People are just as able to build careers as if they actually went to universities, which is bad news for the universities, since few people are going to want to go to them. Therefore, many teachers lose their jobs.
This is another example of technological unemployment. The teachers, people who have devoted their lives to teaching, might be forced to find other jobs. Even though teaching is a highly specialized and very tough job to get right, the demand will reduce drastically in the following years, like many menial disciplines that suffered the same fate due to automation.
Such phenomena are actually indicators of progress. There can be no improvement without change, and this can be seen as society taking off a band-aid.

2012-05-18

All fuzzy inside

Sometimes, a whole lot of times, things aren't as simple as black and white. Sometimes they are thrown together and blended into an unintelligible mess. If you have to deal with such a mess, then strict, boolean logic-based rules and algorithms will only make your mess bigger (and maybe some segfaults).
Sometimes you need to estimate a state, to respond proportionally, or to reason under uncertainty. In those cases, you must use a probabilistic interpretation instead of a brittle, hardcoded case-based controller.
Maybe the simplest and most robust system that deals with this is fuzzy logic.

Shamelessly stolen from Wikipedia

As you might see in the graph, the temperature is a continuous variable, while a system has 3 discrete responses: cold, warm, or hot. This could be, for instance, a thermostat being able to cool or heat up a room.

Some of its robustness lies in being able to transition from one state to another in a smooth way, without starting and stopping abruptly at the border between them. Also, the gradient of the responses may be adjusted, to compensate for whatever subjectivity there may be in deciding the thresholds, or for whatever errors there may be in the measurement, since the real world can be very noisy.

There also exist more advanced controllers similar to this, such as the PID controller, which also takes into account the integral and the derivative of the signal with respect to time. However, in order to decide how much importance is assigned to each component (the proportional, integral, and derivative responses), extra parameters are required, which reduces the basin of initial conditions which will lead to a "good" solution. The more parameters that need tuning, the more unstable a system is.

This is loosely similar to the principle of Occam's razor - the fewer assumptions an explanation requires, the more likely it is to be correct. Also, this is why laziness is one of the programmers' virtues - generally, the simpler the code, the better it is (given that it performs the required job).

2012-05-13

On Kaggle and the Turing Test

There's this website called Kaggle, where you can compete at data analysis and get money (lots of them, if you win). Essentially, people make a model of some data from some organization, and the models with the best score get awarded prizes (or karma).
There's recently been a competition in which the scientists were asked to come up with a solution for automatically grading essays. The company provided examples for training models: essays and their respective human grades.
The scoring metric, called "kappa" in this case, is between 0 (completely useless) to 1 (a perfect model).
One contender noticed that there is a certain discrepancy between the human grades - they only agree with a score of about 76%. They were worried that this would be a ceiling for how well a computer model could do the job (since humans are supposedly better than computers at understanding human language).
However, when the competition came to an end, the best model scored better than the professional human graders, with a score of 81.4%.
While this may have been an effect of the algorithms possessing a closer estimate of the "true" grade (they could average the human ratings), it is still suggesting that computers may now be better than humans at grading essays. That computers can perceive how correct a given text is more precisely than people. I believe this illustrates how quickly technology has been progressing in recent years, becoming almost incomprehensibly better.
The Turing Test claims that if a program can successfully deceive humans into thinking it's another human (using instant messaging), then it's safe to call it intelligent.
I wonder, if programs can understand humans better than humans themselves, what will that mean for humankind? Will all our jobs be automated? Will all of us become unemployed? Do we need another economic model?

Intro

Let me tell you what this blog will be about.
I'm a CS student, and from time to time I have too much time (aka I get bored of doing what I'm supposed to be doing). That's when I look at stuff on the Internet - the great series of tubes linking people together - and see stuff that I enjoy thinking about.
An autoencoder is a neural network that can encode its own input in a very efficient manner and reconstruct it very precisely. Sparse autoencoders are awesome because their codes are very good at classification (and you can easily pick other categories to classify while only retraining the last layer). I'm a fanboy of Andrew Ng (and took his ML class and thought it was awesome).
The motto of the blog - "Listen to the sound of the machine" - is a quote from Elephants Dream, the first Open Movie - I'm also a fan of open source and consider the movie a great achievement.
I intend to post about once a week, though real life has priority.