I was taking the Udacity course on web development, when some philosophical insights struck me. Here they are.
Web development is like writing - the expression of your ideas to the world.
Writing is used for telling someone what you think. For exposing your thoughts, feelings, experience about the world, and letting other people know your beliefs. For improving the lives of others by showing them what can happen. For allowing them to feel what you feel, have them sympathize, and tell their peers.
Web development pretty much does the same things. Except there are not just thoughts, but also functionality, technology. You show people what is possible. What the world has to offer through computers, and how they can help.
The most important software industries are currently web development and mobile development. If one were forced to pick between those two (which isn't necessarily true), I argue that one should pick web.
This is because web development allows more freedom, both for the developer and the user - the user is sitting comfortably, casually exploring, seeking new patterns, learning. Whereas on mobile, the user is in a hurry, wanting to get stuff done, not interested in interacting with a complex system through clumsy interfaces.
On mobile, the user expects a quick experience, and a no-nonsense, YAGNI attitude. All clutter must disappear to satisfy that need. Therefore, the developers must constrain themselves to the most barebones, pragmatic, material of needs - such as location, calendars, quick news/status updates, and basic communication. There simply isn't the interest of doing more complex things.
On web, however, the users have all the time in the world, and want you to amaze them to your best ability. You can show them your intricate and subtle algorithms. You are allowed more complexity, less focus on the necessary, and more on the art.
So there you have it. Web development is a form of expression, while mobile development is solving worldly needs. This is why I choose web development.
2013-04-04
2013-03-31
The cost of a sandwich
Since I'm an avid sandwich eater, I've decided to measure approximately how much it costs to make one sandwich, on average.
The prices for the ingredients are from Auchan in Cluj-Napoca, and the measurements are made by manually approximating, because container shapes aren't quite simple. I believe the most inaccurate product was the lettuce, since density, leaf area, usable leaf size, and shape all varied with the radius. I just guessed.
The picture on the right doesn't do my accuracy justice, since I only had one hand, but was included for fun. You can see the markers for "0" and "2", meaning how many sandwiches had been made when the ketchup inside was at that position.
The ingredients I used were lettuce, ketchup, ham, cheese, bread, and electricity - because I used a sandwich maker.
The most pointless to compute is the electricity, which means I'll dedicate extra effort to it. It involves unit conversion and adding the VAT (the most complicated mathematical operation involved in this analysis).
My sandwich maker uses 750 Watts, and one sandwich is done in 2min15s. Which means I use 0.028125 kWh per sandwich. Electrica Distribuție Transilvania Nord says a kWh of "active power" is 0.02207 RON. The "reactive power" is negligible here, since my sandwich maker has a very high yield (electricity to heat). However, with the VAT, that's 0.0273668 RON. Therefore, it costs me 7.7*10^-4 RON to cook one sandwich. That value is invisible on the graph, when compared to the other ingredients:
The final cost is about 2.3 RON, if you pick either the cheese or the dill cream. If you're planning to start a business, remember the rent and the salaries.
As expected, the most expensive item is the ham. Become a vegetarian and save money, your health, and the world! Or perhaps a partial vegetarian if meat is too tasty. Also, don't take me as a role model. I don't always practice what I preach, try as I might.
Also, you're extremely unlikely to enjoy the same ingredients in the same proportions as me, so your results will vary. They shouldn't vary that much, though. If you do such an analysis yourself, feel encouraged to post your results!
You can view and play with the data here.
Update - 4/12: here's a pic:
The prices for the ingredients are from Auchan in Cluj-Napoca, and the measurements are made by manually approximating, because container shapes aren't quite simple. I believe the most inaccurate product was the lettuce, since density, leaf area, usable leaf size, and shape all varied with the radius. I just guessed.
The picture on the right doesn't do my accuracy justice, since I only had one hand, but was included for fun. You can see the markers for "0" and "2", meaning how many sandwiches had been made when the ketchup inside was at that position.
The ingredients I used were lettuce, ketchup, ham, cheese, bread, and electricity - because I used a sandwich maker.
The most pointless to compute is the electricity, which means I'll dedicate extra effort to it. It involves unit conversion and adding the VAT (the most complicated mathematical operation involved in this analysis).
My sandwich maker uses 750 Watts, and one sandwich is done in 2min15s. Which means I use 0.028125 kWh per sandwich. Electrica Distribuție Transilvania Nord says a kWh of "active power" is 0.02207 RON. The "reactive power" is negligible here, since my sandwich maker has a very high yield (electricity to heat). However, with the VAT, that's 0.0273668 RON. Therefore, it costs me 7.7*10^-4 RON to cook one sandwich. That value is invisible on the graph, when compared to the other ingredients:
The final cost is about 2.3 RON, if you pick either the cheese or the dill cream. If you're planning to start a business, remember the rent and the salaries.
As expected, the most expensive item is the ham. Become a vegetarian and save money, your health, and the world! Or perhaps a partial vegetarian if meat is too tasty. Also, don't take me as a role model. I don't always practice what I preach, try as I might.
Also, you're extremely unlikely to enjoy the same ingredients in the same proportions as me, so your results will vary. They shouldn't vary that much, though. If you do such an analysis yourself, feel encouraged to post your results!
You can view and play with the data here.
Update - 4/12: here's a pic:
Update - 2018-02-03: the cost revisited 5 years later!
2013-03-12
Why I am trying to reduce jobs
If a politician told you he's trying to reduce the number of jobs in the economy, you'd probably not vote for him, and consider him a lunatic. However, here's what me and Paul Țiței have come up with, for improving everyone's lives:
Make everyone unemployed.
What is our reasoning? How could this possibly have any good effects? How would this work?
Well, we believe technology is going to be so disruptive, so efficient and useful, so ubiquitous that it will replace all jobs. There'll be robots doing every menial job there is. After all, why couldn't they, or why shouldn't they? See one of my earlier posts to convince yourself.
Nobody will have to work again. Only the people who enjoy doing something will do it. There'll be learning, There's going to be a utopia. Socialist revolution anyone? Honestly, I think social democracy is one of the best ways of governing (hint hint - the Nordic model).
When I study (computer science, which is one of the hardest fields to automate), I sometimes motivate myself by thinking, "a lot of people will lose their jobs because of me, and it's going to be the best thing that happened to them".
Stop looking down on people who lost their jobs. Their job was among the most easily automated ones. They were automated out. They spent an important part of their life practicing for something that they'll not be able to do. People are competing with machines, and machines are way more competitive. What does 3D printing mean for carpenters? Automated trucks for freight drivers? Scientist robots for scientists, for crying out loud!
Once nobody has a job anymore, we will have been forced to find a better political model, that will unify us and provide for us, using machines. Everything will be for free (or perhaps, given limited resources, meritocratic). Perhaps, money will become alike tokens of merit - karma, and will be only used for donations. The world will become a better place.
The moral of the story: take out someone's job!
2013-01-23
How Long Bets works
You may have heard of Long Bets. It's a great website and service that allows people to participate in bets lasting a long time (at least 2 years), with the winning party having the stake paid to a charity of choice. People who have heard of it are quite favorably disposed to it. While I approve of the convenience and publicity each bet gets, I think people should avoid it, in favor of individual contracts. Here's why.
As soon as the bet is made, the stake must be paid to The Long Now Foundation, as a donation. (In addition to the $50 Publishing Fee). This money then gets invested in a long-term portfolio - "The Farsight Fund" - and through the magic of Compound Interest, it (expectedly) rises exponentially.
This kind of interest visibly occurs especially over long periods of time. As you can see in this picture shamelessly stolen from Wikipedia, a 20% annual interest compounded yearly can yield upwards of 6 times the original sum in just 10 years. But a realistic, less risky level is more like 15% - which still manages to quadruple the amount within 10 years.
The deal-breaker for me is that Long Bets gets to keep 50% of that interest - "growth", they call it. They only donate 50% of what has grown out of the original sum to the chosen charity. But the growth becomes significantly greater than the original sum, especially on long-term bets!
So, there you have it. If you really want to match a donation by betting for your favorite charity, do it more efficiently by placing the stake in a mutual fund, instead of giving away half of the increase, and bypass this site by using legally binding agreements.
Also, if you want to, publish it by announcing your favorite newspaper/site/channel, which will be more than happy to report on it, especially if there's lots of money involved.
2012-12-26
On parallel computing
C is a certain kind of devil that can turn around in the blink of an eye. If you're not careful for an instant, it's gonna bite you in the ass. Hard. But parallel computing is another kind of devil that does the same thing. When you combine the two, you get the most intricate problem humankind has to face. But I'll only discuss the parallel part in this post.
In my experience using threads, I've created completely crazy code (as pointed out by my colleagues, Paul Țiței and Sebi Orțan). I'd like to come clean, and help you avoid making the same mistakes.
First, here's the classic tried-and-true formula guaranteed to occasionally spawn a deadlock:
void *a() {
lock_resource(X);
lock_resource(Y);
do_work();
unlock(X);
unlock(Y);
}
void *b() {
lock_resource(Y);
lock_resource(X);
do_work();
unlock(X);
unlock(Y);
}
Suppose a() and b() are called in a quick succession. Say that a() locks X, then immediately after, b() locks Y. a() is now waiting for Y, and b() for X. Deadlock.
This can be easily prevented by ensuring that a() and b() lock the resources in the same order. Use pencil and paper, or whatever you like, but you MUST guarantee this. It's the only way to ensure correctness.
Our teacher, Rareș Boian, told us at the course that, once you run into a deadlock and are able to fix it, your lab project should be OK. Well, I fixed my deadlock, but I created a more subtle problem (very, very wrong, according to my colleagues, and me, eventually) - which was not noticed by our lab examiner, luckily (or not) for me:
void *thread(){
lock_mutex();
do_things_that_dont_need_mutex();
do_things_that_need_mutex();
do_other_things_that_dont_need_mutex();
unlock_mutex();
}
Of course, when I put it this way, you're gonna notice it. My program has no concurrency, defeating the purpose of threads! Since each thread locks the mutex for its whole run, what are other threads doing in the mean time? Perhaps they're just being handled by the thread/function calling mechanism, but that takes a trivial amount of processing power, compared to the instructions that don't need the mutex. They don't call it critical section for nothing! Don't include uncritical things in it, like I did!
Maybe this wasn't mentioned enough at the course, or maybe it was just me (skipping some and being really stupid), but I felt the need of pointing this out to the rest of the world.
While our overworked teachers may have missed this important philosophical point, I am leveraging the power of the Internet (free content distribution) to point it out to you. Be kind and share this to your fellow prospective programmers!
In my experience using threads, I've created completely crazy code (as pointed out by my colleagues, Paul Țiței and Sebi Orțan). I'd like to come clean, and help you avoid making the same mistakes.
First, here's the classic tried-and-true formula guaranteed to occasionally spawn a deadlock:
void *a() {
lock_resource(X);
lock_resource(Y);
do_work();
unlock(X);
unlock(Y);
}
void *b() {
lock_resource(Y);
lock_resource(X);
do_work();
unlock(X);
unlock(Y);
}
Suppose a() and b() are called in a quick succession. Say that a() locks X, then immediately after, b() locks Y. a() is now waiting for Y, and b() for X. Deadlock.
This can be easily prevented by ensuring that a() and b() lock the resources in the same order. Use pencil and paper, or whatever you like, but you MUST guarantee this. It's the only way to ensure correctness.
Our teacher, Rareș Boian, told us at the course that, once you run into a deadlock and are able to fix it, your lab project should be OK. Well, I fixed my deadlock, but I created a more subtle problem (very, very wrong, according to my colleagues, and me, eventually) - which was not noticed by our lab examiner, luckily (or not) for me:
void *thread(){
lock_mutex();
do_things_that_dont_need_mutex();
do_things_that_need_mutex();
do_other_things_that_dont_need_mutex();
unlock_mutex();
}
Of course, when I put it this way, you're gonna notice it. My program has no concurrency, defeating the purpose of threads! Since each thread locks the mutex for its whole run, what are other threads doing in the mean time? Perhaps they're just being handled by the thread/function calling mechanism, but that takes a trivial amount of processing power, compared to the instructions that don't need the mutex. They don't call it critical section for nothing! Don't include uncritical things in it, like I did!
Maybe this wasn't mentioned enough at the course, or maybe it was just me (skipping some and being really stupid), but I felt the need of pointing this out to the rest of the world.
While our overworked teachers may have missed this important philosophical point, I am leveraging the power of the Internet (free content distribution) to point it out to you. Be kind and share this to your fellow prospective programmers!
2012-10-14
On dolphins
There's lots of evidence that dolphins are conscious. They have bigger brains than us. This is an unfortunate occurrence - the species with the biggest brains on the planet have very limited means of physically manipulating their environment, significantly limiting this planet's innovation output.
So, I wrote a blog post expressing my respect and solidarity. Maybe some day a dolphin will read it and tell the other dolphins that humanity didn't mean any harm, contrary to the pollution and other trouble it's brought on them. Especially Japan. Those bastards.
On the other hand, maybe mankind will cooperate with dolphinkind one day. There already exist machines that can facilitate dolphin-computer interaction via sound. Maybe using more advanced AI techniques (such as an autoencoder :D ), a computer could find a mapping between dolphin words (if there are any) and human words. But dolphin language could, as far as we know, be just as complicated as human languages, and we've had a great deal of trouble modeling even that, even given all our knowledge of English phonemes and words and grammars. Dolphinese is a whole new language, alien to us.
Hopefully, however, in the future, our machines will allow us to talk to dolphins, and tell them our ideas and problems, and allow them to express theirs, which will increase the number of intelligent beings having a say on how the Universe works. Or maybe they'll just tell you they kinda like you.
So, I wrote a blog post expressing my respect and solidarity. Maybe some day a dolphin will read it and tell the other dolphins that humanity didn't mean any harm, contrary to the pollution and other trouble it's brought on them. Especially Japan. Those bastards.
On the other hand, maybe mankind will cooperate with dolphinkind one day. There already exist machines that can facilitate dolphin-computer interaction via sound. Maybe using more advanced AI techniques (such as an autoencoder :D ), a computer could find a mapping between dolphin words (if there are any) and human words. But dolphin language could, as far as we know, be just as complicated as human languages, and we've had a great deal of trouble modeling even that, even given all our knowledge of English phonemes and words and grammars. Dolphinese is a whole new language, alien to us.
Hopefully, however, in the future, our machines will allow us to talk to dolphins, and tell them our ideas and problems, and allow them to express theirs, which will increase the number of intelligent beings having a say on how the Universe works. Or maybe they'll just tell you they kinda like you.
2012-08-29
Web scraping tutorial
I've seen this video about scraping websites, and I want to write a short and quick tutorial on how to do it, after I've tried it and found it fun. If you know of some handy tool that I have not written about and might fit in this tutorial, don't hesitate to comment!
Suppose you're looking for a job. But you're not satisfied clicking on every link that you see on your website, such as Craigslist. You need an automated solution that does that for you, because it's too repetitive and boring. But that's what computers are for! Let's turn boring repetitiveness into exciting fascination!
Scraping means getting structured data from a website using an automatic tool. It might not be very nicely regarded by website owners, since they can't make money if a bot doesn't look at any advertisements, but it relentlessly uses up their bandwidth. And for that matter, it might be in the moral twilight.
For doing this, I use Python and a module for it, namely lxml. There's also mechanize, beautiful-soup, hpricot, but these are beyond the scope of this tutorial. Watch the video for more.
Also, a better interactive interface for Python is IPython, which feels more like an environment than a simple interpreter, since it has auto-completion, syntax coloring, detailed exceptions, and I think it looks better than IDLE even though it's a console app.
On Ubuntu, installing these was as easy as typing:
First, we need a way to tell our Python script which elements of a page we want. We are going to use these things called "CSS Selectors". They're some sort of rules specifying what should and what shouldn't be matched - sort of like regular expressions, but specifically designed for web pages. Here's an example of one matching all bold elements on a page:
Install it. This is a bookmarklet with a javascript script that does stuff to your page in the browser.
For this tutorial, we're going to download all links to job pages on Craigslist Romania. These could, for example, be saved into a text file one on each line, then you could call:
Ok. With the SelectorGadget installed, visit the page and click on the bookmarklet. Be warned, all hell may break loose if you do that. Then, click on a link to a job.
Python 2.7.3 (default, Aug 1 2012, 05:14:39)
Type "copyright", "credits" or "license" for more information.
IPython 0.12.1 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]:
Ah, bliss!
Now, assuming you did install lxml, write these commands:
import lxml.html.parse, urllib2
url = urllib2.urlopen('http://bucharest.craigslist.org/jjj/')
But we want the links to be in a file. The easiest way to do this is by redirecting the output. Save the commands in a script file, and run it like this:
python CraigsLinks.py > file_with_links.txt
and you've got the file you can use in the command at the beginning of the article. And a lot of jobs! Hopefully you'll find a way to make the computer read them too. Congratulations!
That was surprisingly easy to write, and it could be easily adapted to sites which don't take steps against scrapers. Become filled with astonishment at the simplicity of Python and its modules!
There are many more things to learn. For example, running Javascript on sites that use it, tricking the site into thinking we're a browser (faking the user-agent id), navigating from page to page ("crawling"), or scraping and crawling in a parallelized manner. For more, watch the video and explore the tools' tutorials and documentations.
Suppose you're looking for a job. But you're not satisfied clicking on every link that you see on your website, such as Craigslist. You need an automated solution that does that for you, because it's too repetitive and boring. But that's what computers are for! Let's turn boring repetitiveness into exciting fascination!
Scraping means getting structured data from a website using an automatic tool. It might not be very nicely regarded by website owners, since they can't make money if a bot doesn't look at any advertisements, but it relentlessly uses up their bandwidth. And for that matter, it might be in the moral twilight.
For doing this, I use Python and a module for it, namely lxml. There's also mechanize, beautiful-soup, hpricot, but these are beyond the scope of this tutorial. Watch the video for more.
Also, a better interactive interface for Python is IPython, which feels more like an environment than a simple interpreter, since it has auto-completion, syntax coloring, detailed exceptions, and I think it looks better than IDLE even though it's a console app.
On Ubuntu, installing these was as easy as typing:
sudo apt-get install python-lxml ipython
First, we need a way to tell our Python script which elements of a page we want. We are going to use these things called "CSS Selectors". They're some sort of rules specifying what should and what shouldn't be matched - sort of like regular expressions, but specifically designed for web pages. Here's an example of one matching all bold elements on a page:
b
There you go. Looks simple, right? It's just the letter b. Now here's one that matches all bold, italic, and headline-1 formatted text:
b, i, h1
You get the idea. But real expressions used for finding more specific content aren't as simple. Some might include element ID numbers, elements following other elements, parents and children, attributes, and so on. Here's the one that matches our data from the tables on that page:
.row a:nth-child(2)
This one matches all links that are the second child of their parent, and which have the class "row". If you want the details, look here.
But you don't need to learn this syntax in order to do useful stuff. Thank goodness, since it would be such a tedious task. It turns out there's a tool that allows you to visually mark elements and generate the simplest expression for you. It's called SelectorGadget. Such a creative name.
Install it. This is a bookmarklet with a javascript script that does stuff to your page in the browser.
For this tutorial, we're going to download all links to job pages on Craigslist Romania. These could, for example, be saved into a text file one on each line, then you could call:
wget -i file_with_links.txt
to download each of them into the current directory.Ok. With the SelectorGadget installed, visit the page and click on the bookmarklet. Be warned, all hell may break loose if you do that. Then, click on a link to a job.
When you click on it, lots of things may turn yellow. That means those things have also been selected by the generated selector. To mark one as unwanted, click on it again, and it will turn red, and will serve as a counterexample of what you want. Keep doing this until you only have the links to the jobs. It may take a few clicks until you get exactly what you want, and it's not guaranteed to find a solution, but it saves you a whole lot of effort when it does.
Now that we have this... alien CSS Selectonator thingy... we save it for later. We're going to use it in our program, even though we don't fully understand it. Computers have a mind of their own.
Now, let's learn how to use lxml.html! I picked this one because it's quite robust (hasn't let me down yet), and very fast. It is a C extension of Python, so it's quite close to as fast as possible. Fire up IPython and enjoy the view:
Type "copyright", "credits" or "license" for more information.
IPython 0.12.1 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]:
Ah, bliss!
Now, assuming you did install lxml, write these commands:
import lxml.html.parse, urllib2
url = urllib2.urlopen('http://bucharest.craigslist.org/jjj/')
Now, the HTML stuff is available as url.read(). We want to parse it and get the root node of the resulting tree:
doc = parse(url).getroot()
And to apply the selector to get our job links:
links = doc.cssselect('.row a:nth-child(2)')
To print them, we need their "href" attribute. We can simply iterate over them like this:
for link in links:
print link.attrib['href']
Great. Hopefully you'll now have your screen full of Craigslist links. This is good.To print them, we need their "href" attribute. We can simply iterate over them like this:
for link in links:
print link.attrib['href']
But we want the links to be in a file. The easiest way to do this is by redirecting the output. Save the commands in a script file, and run it like this:
python CraigsLinks.py > file_with_links.txt
and you've got the file you can use in the command at the beginning of the article. And a lot of jobs! Hopefully you'll find a way to make the computer read them too. Congratulations!
That was surprisingly easy to write, and it could be easily adapted to sites which don't take steps against scrapers. Become filled with astonishment at the simplicity of Python and its modules!
There are many more things to learn. For example, running Javascript on sites that use it, tricking the site into thinking we're a browser (faking the user-agent id), navigating from page to page ("crawling"), or scraping and crawling in a parallelized manner. For more, watch the video and explore the tools' tutorials and documentations.
Subscribe to:
Posts (Atom)