Linux Command Line – From Zero to Expert (New Video Course)

I have a new video course that helps aspiring computer scientists excel in almost any area that they decide to pursue by teaching them about the Linux command line — one of the best investments in time anyone working with computers can make.

The course is for beginners but even has something for people who’ve had a little prior experience with the command line.

For my blog readers, I’m making it available for just $9.99 (with a 30-day money back guarantee). Use the following link to get your discount: Linux Command Line – From Zero to Expert

Once you know about the command line, you will be in a position to learn much more interesting stuff such as Linux system administration, VoIP and many more.



Geting started with Hadoop 2.2.0 — Building

I wrote a tutorial on getting started with Hadoop back in the day (around mid 2010). Turns out that the distro has moved on quite a bit with the latest versions. The tutorial is unlikely to work. I tried setting up Hadoop on a single-node “cluster” using Michael Knoll’s excellent tutorial but that too was out of date. And of course, the official documentation on Hadoop’s site is lame.

Having struggled for two days, I finally got the steps smoothed out and this is an effort to document it for future use.

Varnish Cache for WordPress on cPanel

Varnish is an extremely easy to configure server cache software that can help you counter the ‘slashdot effect’ — high traffic over a small period of time. The way Varnish does this is by sitting between the client and the webserver and providing cached results to the client so that the server doesn’t have to process every page. It’s better than memcache etc because the request never gets to the webserver. You can avoid one of the bottlenecks this way. In this tutorial, we’ll cover how to setup Varnish on a VPS (or dedicated server) where you have root access and are running your site using cPanel/WHM. It also applies to situations where you don’t have cPanel/WHM. You can just skip the cPanel portion if that’s the case. So, let’s get started.

Helpful Tips for Newbie System Admins

Before you start reading this tutorial, let me remind you once again who the intended audience of this post is — newbie system administrators. If you’re an experienced admin and are going to laugh at my naivete for writing such basic stuff, please go away — or provide some more ‘advanced’ tips in the comments below so that I know better for the future.

Anyway, let’s begin. If you’re like most newbie system admins, you run a windows system on your home PC / work laptop and connect to the servers you’re managing using putty. Everyone uses putty, you say. Well, yes but there are alternatives. One of the things I hate is to have to copy/paste all the usernames and passwords into that putty session before I can login. So, let’s get rid of that first.

Installing Jira with MySQL

Jira is an extremely well-known issue tracking system and is used widely for project management in a wide array of fields. It has quite a detailed documentation but it’s in the form of a wiki and as we all know, wikis are the worst way of creating software documentation. Anyway, in order to install Jira with MySQL, you will have to click and click and click. This tutorial aims to ease this issue by providing step-by-step instructions on how to install jira and enable it to connect with MySQL for storage. So let’s begin.

Note: These instructions are for the standalone version of Jira. You can use them quite easily for the WAR/EAR version but if you’re going on that route, you probably don’t need this article.

Ok, first download and install Java (I usually go with JDK) — version 6 is preferred. You can get the .bin file for Linux from (I refuse to call it Oracle Java). Make it executable and run it. JDK will be extracted in your current directory. Move it to /usr/share. Then set the JAVA_HOME variable.

[sourcecode lang=”bash”] JAVA_HOME=/usr/share/jdk1.6.0_23
export JAVA_HOME

You might want to set this in your ~/.bash_profile file.

Next, create a user with which we will start jira — for security reasons.

[sourcecode lang=”bash”] sudo /usr/sbin/useradd –create-home –home-dir /usr/local/jira –shell /bin/bash jira

Download and extract the jira standalone .tar.gz file to /usr/local/jira and change ownership of all the files to the jira user:

[sourcecode lang=”bash”] chown jira:jira /usr/local/jira -R

Open port 8080 which is used by jira (by default) — edit the file /etc/sysconfig/iptables and enter the following rule:

[sourcecode lang=”bash”] -A RH-Firewall-1-INPUT -m state –state NEW -m tcp -p tcp –dport 80 -j ACCEPT

Set the required jira.home property in the file /usr/local/jira/atlassian-jira/WEB-INF/classes/

[sourcecode lang=”bash”] mkdir /usr/local/jira/jirahome
vi /usr/local/jira/atlassian-jira/WEB-INF/classes/
# set the variable jira.home to /usr/local/jira/jirahome

Restart the firewall and start jira using the following command:

[sourcecode lang=”bash”] sudo -u jira nohup /usr/local/jira/bin/ run & > jira-nohup.log

The use of sudo will run jira as the user jira and nohup will ensure that jira won’t stop running as soon as you close the shell. The output produced by the command will be saved to jira-nohup.log

Open the browser on location http://your.ip.add.ress:8080/ and ensure that you can see the jira setup page. Don’t bother proceeding with the setup because as soon as we connect jira to MySQL, this information will be lost. Let’s do that now.

Connecting with MySQL

Begin by stopping jira, installing mysql server, setting it to always start on system startup and starting the server.

[sourcecode lang=”bash”] /usr/local/jira/bin/
yum install mysql mysql-server
chkconfig mysql on
service mysqld start

Then start mysql, create a database and user for jira and assign all rights to the new user on the new database.

[sourcecode lang=”bash”] mysql
create database jiradb character set utf8;
grant all privileges on jiradb.* to jira@localhost identified by ‘[your new password]’;

Now, edit the conf/server.xml in the jira directory and change the data source as follows:

[sourcecode lang=”bash”] <Resource name="jdbc/JiraDS" auth="Container" type="javax.sql.DataSource"
password="[jira user password]"
validationQuery="select 1"/>

Note that the ampersand code in the connection string is not a formatting problem. It really has to say amp with an ampersand and a semicolon at either end.

Finally, edit the atlassian-jira/WEB-INF/classes/entityengine.xml file to set the final attribute:

[sourcecode lang=”bash”] <datasource name="defaultDS" field-type-name="mysql"
use-foreign-keys="false" …

Delete the schema-name="PUBLIC" attribute immediately after the changed line to ensure that the database is populated properly.

Start jira once again and now you can enter the setup information. Open MySQL and ensure that jira is, in fact, using this engine.

Creating Multiple Volumes on Amazon EC2

For those of you who have used Amazon AWS (EC2 specifically) for more than just testing would know that the / partition in an AMI does not go beyond 10GB. So, if you need more space you need to mount more volumes. This is a beginner’s guide to do just that.

First off, create an AMI with EBS storage type and take a note of the zone in which that AMI sits. You can find this through the details pane in the ‘instances’ section of your AWS management console. Then, go over to the ‘volumes’ section and create a new volume.

Here, you need to make sure that you create the volume in the same zone as your AMI or you won’t be able to use the volume with it. Specify the capacity of the volume. That’s all that’s required at this stage. Click create and wait until the new volume becomes available. Then right-click on the volume and ‘Attach Volume’. Make sure you select the right instance and then specify the new device name. You can use something like ‘/dev/sdd’. Just use fdisk -l in your running instance to make sure it’s not already in use.

After that, login to your AMI and fdisk -l again to make sure that the new volume is available. Now, you need to create a partition and format it for linux to use. Then, mount it someplace. Here we’re going to move the whole /var to this new partition so that our logs etc can have more free space to grow.

[sourcecode lang=”bash”] mkfs.ext3 /dev/sdd
mkdir /mnt/var
mount /dev/sdd /mnt/var
cp -R /var/* /mnt/var
touch /mnt/var/new-vol

The last line is there to verify after boot that you are indeed using the newly volume. You can also use df -h after reboot to test this but I just feel better this way.

Finally, open up the /etc/fstab file and enter the new device /dev/sdd and the mount point /var.

[sourcecode lang=”bash”] /dev/sdd /var ext3 defaults 0 0

And reboot.

Bash Script to Find and Replace in a Set of Files

Ok, so this might sound childish to the more experienced admins and programmers but I’ve always found the need to search and replace a string in multiple files. For example, if I have to work with an inexperienced programmer’s code, I might have to change the name of the database in a couple of dozen places. If you are in such a place, you might want to try the following script:

[sourcecode lang=”bash”] for f in submit_*;
do sed "s/old_db_name/new_db_name/" < $f > a_$f ;
mv a_$f $f ;

The first line finds all files that match the pattern submit_*. The loop first calls sed, makes the replacement and outputs the file to a_$f. Finally, it renames the a_$f file to $f so that we get the original filename. So, there you go. You can make all sorts of complicated finds and replaces through regular expressions and unleash the power of sed on this script. Chao.

Web Fax for Asterisk

Over the past few weeks, I have been working with the popular telephony software asterisk and all the stack that stands on top of it. I have (in coordination with a friend) setup asterisk, FreePBX, a2billing, fax for asterisk and vicidial on several production servers. Combined, these provide a complete telephony solution for a wide range of commercial organizations. As a side note, I might be writing tutorials about some of these things in the future.

One of most problematic of these technologies was getting fax to work with asterisk. We tried many variations and finally found out that Digium’s Fax for Asterisk, or Free Fax for Asterisk (FFA) is currently the most stable and easy to set up. However, it does not provide an easy way to let your end users send faxes if they don’t have SIP enabled fax machines. What’s more, there is no software available that would allow you to do that! So, in one of our projects, we had to come up with a custom interface and we decided to open it up so that many others who need it can benefit from our efforts and hopefully build on it.

We call this PHP-based script Web Fax for Asterisk and are releasing it under GPLv3. For those of you who just want to get the code, you can get it from You can also get it from the SF SVN repo if you want to contribute. (In that case, gimme a shout and I’ll allow you to commit.)

For those of you who want to learn how it’s made, please read on.

Install, Configure and Execute Apache Hadoop from Source

Hadoop is Apache’s implementation of the brand-spanking new programming model called MapReduce, along with some other stuff such as Hadoop Distributed Filesystem (HDFS). It can be used to parallelize (or distribute) and thus massively speedup certain kinds of data processing. This tutorial will talk about installing, configuring and running the Hadoop framework on a single node. In a future tutorial, we might create a project that actually uses Hadoop for problem solving through multiple clustered nodes. Here, we start by looking at the setup of a single node.

Installing from the source is important if you want to make changes to the Hadoop framework itself. I’ve found that it’s also the easier method if you simply want to deploy Hadoop. Whichever path you want to take, going with SVN is probably the best way. So, first checkout the source of a stable branch. I used 0.20.2 because it is the ‘stable’ branch at the time and because I was having trouble with checking out 0.20.

But before that, you need to setup the dependencies. Here they are:

  1. JDK (I found 1.6+ to be compatible with the 0.20.2 branch)
  2. Eclipse (SDK or ‘classic’. This is required for the building the Hadoop eclipse plugin. I used 3.6.1)
  3. Ant (for processing the install/configuration scripts)
  4. xerces-c (the XML parser)
  5. SSH server
  6. g++

By the way, I used Ubuntu 10.04 as my dev box. Download binaries of Eclipse, ant and xerces-c. Extract them in your home folder and remember their folder names. We’ll be needing them later.

Install the rest of the dependencies with:

[sourcecode language=”bash”] $ sudo apt-get install sun-java6-jdk ssh g++

Also, ssh server needs to be setup so that it doesn’t require password. You can check it with ‘ssh localhost’. If it does require a password, disable that using:

[sourcecode language=”bash”] $ ssh-keygen -t dsa -P ” -f ~/.ssh/id_dsa
$ cat ~/.ssh/ >> ~/.ssh/authorized_keys

Now, go you your home directory, setup environment variables and checkout the Hadoop source:

[sourcecode language=”bash”] nam@zenbox:~$ cd ~
nam@zenbox:~$ export JAVA_HOME=/usr/lib/jvm/java-6-sun
nam@zenbox:~$ export PATH=$PATH:/usr/share/apache-ant-1.8.1
nam@zenbox:~$ svn co hadoop

When you do this, you get pre-built hadoop binaries. (We’re skipping the actual build part here. We’ll come back to this shortly.) You can setup the requirements and the examples and then test the ‘pi’ example so:

[sourcecode language=”bash”] nam@zenbox:~$ cd hadoop
nam@zenbox:~/hadoop$ ant
nam@zenbox:~/hadoop$ ant examples
nam@zenbox:~/hadoop$ bin/hadoop
nam@zenbox:~/hadoop$ bin/hadoop jar hadoop-0.20.2-examples.jar pi 10 1000000

Here’s (part of) what I got as output:

[sourcecode language=”bash”] Number of Maps = 10
Samples per Map = 1000000
Wrote input for Map #0
Wrote input for Map #1

Starting Job
10/09/25 15:01:21 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
10/09/25 15:01:21 INFO mapred.FileInputFormat: Total input paths to process : 10
10/09/25 15:01:21 INFO mapred.JobClient: Running job: job_local_0001
10/09/25 15:01:21 INFO mapred.FileInputFormat: Total input paths to process : 10
10/09/25 15:01:21 INFO mapred.MapTask: numReduceTasks: 1
10/09/25 15:01:21 INFO mapred.MapTask: io.sort.mb = 100
10/09/25 15:01:21 INFO mapred.MapTask: data buffer = 79691776/99614720
10/09/25 15:01:21 INFO mapred.MapTask: record buffer = 262144/327680
10/09/25 15:01:22 INFO mapred.MapTask: Starting flush of map output
10/09/25 15:01:22 INFO mapred.MapTask: Finished spill 0
10/09/25 15:01:22 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting

10/09/25 15:01:24 INFO mapred.LocalJobRunner:
10/09/25 15:01:24 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now
10/09/25 15:01:24 INFO mapred.FileOutputCommitter: Saved output of task ‘attempt_local_0001_r_000000_0’ to hdfs://localhost:9000/user/nam/PiEstimator_TMP_3_141592654/out
10/09/25 15:01:24 INFO mapred.LocalJobRunner: reduce > reduce
10/09/25 15:01:24 INFO mapred.TaskRunner: Task ‘attempt_local_0001_r_000000_0’ done.
10/09/25 15:01:24 INFO mapred.JobClient: map 100% reduce 100%
10/09/25 15:01:24 INFO mapred.JobClient: Job complete: job_local_0001
10/09/25 15:01:24 INFO mapred.JobClient: Counters: 15
10/09/25 15:01:24 INFO mapred.JobClient: FileSystemCounters
10/09/25 15:01:24 INFO mapred.JobClient: FILE_BYTES_READ=1567406
10/09/25 15:01:24 INFO mapred.JobClient: HDFS_BYTES_READ=192987
10/09/25 15:01:24 INFO mapred.JobClient: FILE_BYTES_WRITTEN=199597
10/09/25 15:01:24 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1781093
10/09/25 15:01:24 INFO mapred.JobClient: Map-Reduce Framework
10/09/25 15:01:24 INFO mapred.JobClient: Reduce input groups=20
10/09/25 15:01:24 INFO mapred.JobClient: Combine output records=0
10/09/25 15:01:24 INFO mapred.JobClient: Map input records=10
10/09/25 15:01:24 INFO mapred.JobClient: Reduce shuffle bytes=0
10/09/25 15:01:24 INFO mapred.JobClient: Reduce output records=0
10/09/25 15:01:24 INFO mapred.JobClient: Spilled Records=40
10/09/25 15:01:24 INFO mapred.JobClient: Map output bytes=180
10/09/25 15:01:24 INFO mapred.JobClient: Map input bytes=240
10/09/25 15:01:24 INFO mapred.JobClient: Combine input records=0
10/09/25 15:01:24 INFO mapred.JobClient: Map output records=20
10/09/25 15:01:24 INFO mapred.JobClient: Reduce input records=20
Job Finished in 3.58 seconds
Estimated value of Pi is 3.14158440000000000000

So, now that you know that Hadoop is actually running and working as it should, it’s time to setup the server. First, you need to define the node configurations in the conf/core-site.xml

[sourcecode language=”bash”] <!– Put site-specific property overrides in this file. –>
<!– set to 1 to reduce warnings when running on a single node –>

Also, setting the JAVA_HOME environment variable does not work when starting the hadoop service. So, you need to set it up in conf/

[sourcecode language=”bash”] # The java implementation to use. Required.
export JAVA_HOME=/usr/lib/jvm/java-6-sun

Then format the namenode specified in the configuration file above. See help for more details.

[sourcecode language=”bash”] nam@zenbox:~/hadoop$ bin/hadoop namenode help
10/09/25 15:17:18 INFO namenode.NameNode: STARTUP_MSG:
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = zenbox/
STARTUP_MSG: args = [help] STARTUP_MSG: version = 0.20.3-dev
STARTUP_MSG: build = -r ; compiled by ‘nam’ on Sat Sep 25 11:41:00 PKT 2010
Usage: java NameNode [-format] | [-upgrade] | [-rollback] | [-finalize] | [-importCheckpoint] 10/09/25 15:17:18 INFO namenode.NameNode: SHUTDOWN_MSG:
SHUTDOWN_MSG: Shutting down NameNode at zenbox/

nam@zenbox:~/hadoop$ bin/hadoop namenode -format

Now you can start the service with the script and hopefully see the output as follows:

[sourcecode language=”bash”] nam@zenbox:~/hadoop$ bin/
starting namenode, logging to /home/nam/hadoop/hadoop-0.20.2/bin/../logs/hadoop-nam-namenode-zenbox.out
localhost: starting datanode, logging to /home/nam/hadoop/hadoop-0.20.2/bin/../logs/hadoop-nam-datanode-zenbox.out
localhost: starting secondarynamenode, logging to /home/nam/hadoop/hadoop-0.20.2/bin/../logs/hadoop-nam-secondarynamenode-zenbox.out
starting jobtracker, logging to /home/nam/hadoop/hadoop-0.20.2/bin/../logs/hadoop-nam-jobtracker-zenbox.out
localhost: starting tasktracker, logging to /home/nam/hadoop/hadoop-0.20.2/bin/../logs/hadoop-nam-tasktracker-zenbox.out

Finally, you can put a file in the hadoop filesystem, get the file listing and cat a file in the HDFS.

[sourcecode language=”bash”] nam@zenbox:~/hadoop$ bin/hadoop dfs -put ~/a.txt a.txt
nam@zenbox:~/hadoop$ bin/hadoop dfs -ls
Found 1 items
-rw-r–r– 3 nam supergroup 5 2010-09-25 15:20 /user/nam/a.txt
nam@zenbox:~/hadoop$ bin/hadoop dfs -cat a.txt
[contents of a.txt here] nam@zenbox:~/hadoop$ bin/hadoop dfs -rm a.txt
Deleted hdfs://localhost:9000/user/nam/a.txt

We’ll get to the building of source in another installment of this tutorial inshaallah.

Quoth Ubuntu, ‘Nevermore’

I installed Ubuntu two days ago and was surprisingly pleased by the responsiveness and general outlook of the OS. I spent a day getting everything just the way I like. It booted up really fast and I was very pleased with the improvements in the user interface that have come about since the last time I used it. Ubuntu has surely come a long way from 8.04 to 10.04.1.

I did have two problems though. The first one is sad so I would mention that in brief. The wireless connection drops when we have a power outage. No problem there since the wireless router goes off but when the power comes back on and the wireless network becomes available, Ubuntu refuses to connect automatically. Turns out, it’s a ‘known bug’ and there are many ‘working scripts’ to fix it — none of them seem to work, by the way. Ubuntu will either require me to click on connect manually or require that I enter the password that I have saved in the connection manager. In any case, it needs some sort of interaction. Anyhow, I decided I can live with this and moved on. Then came the second problem that’s the subject of this post. This requires that I go back to my story.

To get back to my narrative, one of the things that I absolutely have to set up is the ‘power setting’ on my laptop. See, I leave my laptop on at night and it sits right next to me while I sleep. So, not only does it have to not go to sleep because of the downloading going on, it also must be unobtrusive and not wake me up. So, I set up my download managers and the power settings on Ubuntu so that the screen will be put to sleep after a minute of inactivity. I check it to make sure it works. I go to sleep only to wake up after an hour or so because — surprise — my laptop display is still on!

Surprised because I just checked that it worked, I re-checked the settings to make sure I had them right. They were indeed correct. So, slightly annoyed by this time, I search the internet and lo and behold: it’s another known bug. Turns out, there’s a script you can write and ‘chown’ it and create a cron job for it that will turn the screen off. Problem is, it doesn’t work. So, I search some more and find out that I can ‘xset dpms force off’ to turn the screen off manually. Fair enough. I create a little shortcut for the command, hit the icon and the screen goes off. Finally, I can get some sleep. I lie down and am just about to drift off to sleep when the screen comes back on like a zombie raised from the dead. I’d finally had enough and decided to let it rest. I covered my eyes with my hand. That didn’t work but as I was about get back up to turn the laptop off, the screen goes off automatically! Guess it had had enough fun for the day.

Poe had his insomniac raven; I have Ubuntu.

(P.S. To those who are wondering why I couldn’t just turn the laptop off in the first place, see the category this post is put in.)