Moving On
I haven’t been posting on this blog over the past year, so I’ve decided to shut it down and start afresh using Tumblr. All development related posts will be posted at http://blog.taha.me from now.
I haven’t been posting on this blog over the past year, so I’ve decided to shut it down and start afresh using Tumblr. All development related posts will be posted at http://blog.taha.me from now.
I was surprised to see how easy it is to set up a Hadoop cluster on EC2. I thought I’d share the method I used for the benefit of others. To set up the cluster, you need two things-
The following steps should do the trick-
Step 1 – Add the JDK repository to apt and install JDK (replace lucid with your Ubuntu version, check using lsb_release – c in the terminal) -
$ sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner" $ sudo apt-get update $ sudo apt-get install sun-java6-jdk
Step 2 – Create a file named cloudera.list in /etc/apt/sources.list.d/ and paste the following content in it (again, replace lucid with your version)-
deb http://archive.cloudera.com/debian lucid-cdh3 contrib
deb-src http://archive.cloudera.com/debian lucid-cdh3 contrib
Step 3 – Add the Cloudera Public Key to your repository, update apt, install Hadoop and Whirr-
$ curl -s http://archive.cloudera.com/debian/archive.key | sudo apt-key add - $ sudo apt-get update $ sudo apt-get install hadoop-0.20 $ sudo apt-get install whirr
Step 4 – Create a file hadoop.properties in your $HOME folder and paste the following content in it.
whirr.service-name=hadoop
whirr.cluster-name=myhadoopcluster
whirr.instance-templates=1 jt+nn,1 dn+tt
whirr.provider=ec2
whirr.identity=[AWS ID]
whirr.credential=[AWS KEY]
whirr.private-key-file=${sys:user.home}/.ssh/id_rsa
whirr.public-key-file=${sys:user.home}/.ssh/id_rsa.pub
whirr.hadoop-install-runurl=cloudera/cdh/install
whirr.hadoop-configure-runurl=cloudera/cdh/post-configure
Step 5 – Replace [AWS ID] and [AWS KEY] with your own AWS Access Identifier and Key. You can find them in the Access Credentials section of your Account. Notice the third line, you can use it to define the nodes that will run on your cluster. This cluster will run a node as combined namenode (nn) and jobtracker (jt) and another node as combined datanode (dn) and tasktracker (tt).
Step 6 – Generate a RSA keypair on your machine. Do not enter any passphrase.
$ ssh-keygen -t rsa
Step 7 – Launch the cluster! Navigate to your home directory and run-
$ whirr launch-cluster --config hadoop.properties
This step will take some time as Whirr creates instances and configures Hadoop on them.
Step 8 – Run a Whirr Proxy. The proxy is required for secure communication between master node of the cluster and the client machine (your Ubuntu machine). Run the following command in a new terminal window-
$ sh ~/.whirr/myhadoopcluster/hadoop-proxy.sh
Step 9 - Configure the local Hadoop installation to use Whirr for running jobs.
$ cp -r /etc/hadoop-0.20/conf.empty /etc/hadoop-0.20/conf.whirr $ rm -f /etc/hadoop-0.20/conf.whirr/*-site.xml $ cp ~/.whirr/myhadoopcluster/hadoop-site.xml /etc/hadoop-0.20/conf.whirr $ sudo update-alternatives --install /etc/hadoop-0.20/conf hadoop-0.20-conf /etc/hadoop-0.20/conf.whirr 50 $ update-alternatives --display hadoop-0.20-conf
Step 10 – Add $HADOOP_HOME to ~/.bashrc file by placing the following line at the end-
export HADOOP_HOME=/usr/lib/hadoop
Source the .bashrc file-
$ source ~/.bashrc
Step 11 – Test run a MapReduce job-
$ hadoop fs -mkdir input $ hadoop fs -put $HADOOP_HOME/README.txt input $ hadoop jar $HADOOP_HOME/hadoop-examples-*.jar wordcount input output $ hadoop fs -cat output/part-* | head
Step 12 (Optional) – Destroy the cluster-
$ whirr destroy-cluster --config hadoop.properties
Note: This tutorial was prepared using material from the CDH3 Installation Guide
For my Final Year Project at NUCES, I’m working on a cloud application which will provide real-time collaboration features for development teams who work in distributed environments. I’ve been going through the various cloud platforms available (Force.com, Azure, GAE, EC2, Heroku etc), and I’ve decided to settle on Google App Engine (GAE). Here’s why-
1. Doesn’t cost a dime to start and stays cheap
The ‘What is Google App Engine?’ page says this about the initial pricing-
“With App Engine, you only pay for what you use. There are no set-up costs and no recurring fees. The resources your application uses, such as storage and bandwidth, are measured by the gigabyte, and billed at competitive rates. You control the maximum amounts of resources your app can consume, so it always stays within your budget.
App Engine costs nothing to get started. All applications can use up to 500 MB of storage and enough CPU and bandwidth to support an efficient app serving around 5 million page views a month, absolutely free. When you enable billing for your application, your free limits are raised, and you only pay for resources you use above the free levels.”
In contrast, the Amazon EC2 pricing page takes us to a ‘simple calculator’ to find out what it will cost me.
Also, App Engine is seemingly cheaper than EC2 in the long run, as stated in this article here (though this article claims to be deprecated).
2. No setup hassle
As I’m just beginning my cloud computing journey, I have little or no idea about what my requirements will be in terms of CPU processing, memory, data transfer requirements etc. GAE hides all these details from a novice like me, and allows me to focus on the actual development of the app. When I went to the Amazon EC2, I was faced with an array of ‘simple’ steps to complete before I could get the cloud set up.
3. Lets me focus on the app
GAE allows me to focus on developing the app, as I mentioned above. For Python, it comes bundled with the webapp framework, and is compatible with Django as well. The webapp framework also provides useful APIs for Google services such as the Users API which allows simple authentication functionality and account details for Google users. There are similar APIs for other services such as calendars, mail etc. It took me less than half an hour to get the ‘hello world’ app up and running. I can’t even imagine that with EC2, with my current know-how of cloud computing.
4. Comfortable with Python
This isn’t really a reason, but more of an advantage of using GAE for me. I’ve been working with Python and Django for a while now, and I feel that one of the reasons I decided on GAE (in comparison with some of the other small-scale developer friendly platforms) was that I like working with Python. Heroku also allows developers to start developing for the cloud for free and is easy to set up, but I naturally tilted towards GAE because of Python.
Conclusion
GAE is very micro-ISV friendly because of the ease and simplicity of setup, and its cheap pricing options. That’s why I’m going with it. I don’t have the time or the resources to set up something like EC2. However, that said, using GAE is only valid in my scenario because my requirements for the cloud app are generic and don’t require any granular settings. EC2 offers much more flexibility and fine grained control over the application environment. You can basically set up any operating environment using the Machine Images support.
For applications such as enterprise software etc, EC2 is probably the better choice. But as a student with not a lot of resources, GAE offers me the best possible choice for now to start building my app without worrying about things like shards, elastic IPs and On-demand instances! :p
However, there are some things about GAE that are still worrying me, such as platform lock-in and having my entire app on Google servers. We’re worried about our personal privacy when we just have some pictures and text on Facebook servers, what happens when my entire company is based off Google servers? Although I’m sure the Google servers are as safe as a house and as fast as a rabbit, what happens if I want to shift my app off Google servers to some other cloud? There may be some options for porting the code, but the data is definitely locked into the Google Datastore as far as I know.
A great alternate way to create iPhone apps without using a Mac is to use HTML, CSS and Javascript to deliver applications that appear to be native iPhone apps. For someone like me (who can’t afford a Mac yet) it offers an awesome hassle-free method to deliver targeted content to iPhone users.
The best book on the topic that I’ve found is Building iPhone Apps with HTML, CSS, and JavaScript by Jonathan Stark.
There are two approaches to building iPhone apps this way (to the best of my knowledge)-
I used jQTouch to create an sample app that pulls content from my twitter feed and displays it. Check it out here.
I’ve always found it cumbersome that I have to describe the structure and design (that is, HTML and CSS) of a website separately, especially when I’m simply designing a web page. It makes more sense (at least to me) to write the CSS of the section containers (header, footer, columns etc) directly with the HTML.
So for that, I came up with a language (and wrote its compiler using Python and Spark) that allows me to express web designs using a hierarchical format that is very similar to writing functions (as in C++ or Java). I used this approach because I think it would make web designing easier for developers.
There are two files that define a web page in Codezyn- a Template file which defines the structure and design of the page, and a Generator file that defines the contents of the page
I haven’t written any documentation for it (yet). I’m putting up a few tests I wrote along with their compiled results for now-
Test 1: Template | Generator | Output
For the project, I had to manually model and render a 10 second animation using the the Renderman interface and any open source Renderman renderer. The basic requirements were to use-
Here it is-
The frames were rendered using Aqsis and the movie was created with Windows Movie Maker. Although it could have been better if I gave it more time, but I started like a day before submission, so I’m satisfied with the result overall.
A high-res still of the inside scene-
I gave an interview to the CIOPakistan representative who was present onsite during PROCOM, and the interview got featured on CIOPakistan
Read it here.
I didn’t expect to be winning any more competitions so soon after the recent success at NASCON ’10. But all that was put aside with the arrival of NED TechElite, which took place on Thursday.

The playing field for the programming competition was comprehensively large at TechElite, with a total of 97 teams participating from 35 different universities of Pakistan. The event lived up to it’s name- all the tech-elite were truly there, from big industry names like TRG, Folio3, Systems Limited and TPS to the ever friendly Jehan Ara, President P@SHA and Rabia Garib, Editor CIOPakistan.
Anyway, on to the competition itself. The programming competition lacked a bit in arrangements and judging, but overall it was well managed (Although a couple of my friends had serious issues with the slow and cumbersome PCs at NED CIS).
My team (Me, Qasmi and Mustafa) took the lead from the first minute, submitting solutions in a timely manner to stay on top throughout the event, and take the top position. The coordination within the team was very good throughout- and I believe that is the primary reason behind our success!
The closing ceremony rounded up a very good evening for us, with the PKR 20,000 winning prize and a nice shield to our name.
But wait- the winning streak didn’t end at NED either- on to the home ground battle, PROCOM.NET 2010!

PROCOM.NET took place (at FAST-NUCES Karachi Campus, obviously) yesterday and today. The main programming competition was today, so I went to PROCOM today with a lot of anxious memories of the 2008 event (when it was my first time participating as a freshman) and of the 2009 event (which turned out to be a very frustrating evening for me).
But this time, the tide turned and my team (Me, Qasmi and Osama – Coding Hazard!) took the lead with three lightning fast strikes within the first half hour. From there on, it was just a matter of keeping our cool and getting a couple of other questions out of the way in the remaining 3.5 hours. We managed that comfortably, topping the table throughout the event and taking the prize.
Don’t know the prize money yet, but got the winning shield in the closing ceremony a few hours ago! That concludes the wrap-up of this weeks events, which incidentally bring my tally of wins to 3 in the space of 2 weeks
Tremendous fun!
Before going to NASCON, I had to scramble to finish the lecture recording software I was working on as part of the SE project. Had the demo for that today, so I’m finally done with it! Yay!
Here are some snaps of the recording tool in action-
Can’t thank FFMPEG enough