Skip to main content

Posts

Showing posts from April, 2014

Running a MapReduce Job (WordCount) on Hadoop Single-Node Cluster

In the previous post we’ve seen how to install Hadoop on Ubuntu, now it’s time to run our first Hadoop MapReduce job. We will use the WordCount example job which reads text files and counts how often words occur. The input is text files and the output is text files, each line of which contains a word and the count of how often it occurred, separated by a tab. 1. Download example input data We will use three ebooks from Project Gutenberg for this example: The Outline of Science, Vol. 1 (of 4) by J. Arthur Thomson The Notebooks of Leonardo Da Vinci Ulysses by James Joyce Download each ebook and store the files in a local temporary directory of choice, for example ‘/tmp/gutenberg’. Now we have tochange the file ownership to hduser . Open a terminal and run: 1 sudo chown –R hduser:hadoop /tmp/gutenberg 2. Restart the Hadoop cluster Open a new terminal and restart your Hadoop cluster if it’s not running already 1 su - hduser 2 /usr/local/hadoop/bin/sta...

How to Install Hadoop-1.2.1 (Single-Node Cluster) on Ubuntu-12.04, with JDK 8

In this DIY we will see how to set up a single-node Hadoop cluster backed by the Hadoop Distributed File System (HDFS), running on Ubuntu-12.04. The main goal of this tutorial is to get a simple Hadoop installation up and running so that you can play around with the software and learn more about it. This tutorial has been tested with the following software versions: Ubuntu -12.04 (LTS) Hadoop- 1.2.1, released Aug, 2013 JDK- 8 update 5 Step 1: Prerequisites 1. Download Hadoop Hadoop-1.2.1 can be downloaded from here . Select a mirror, then select 'hadoop-1.2.1/' directory and download hadoop-1.2.1.tar.gz . I assume you have downloaded it into your '/home/ user_name /Downloads' directory. 2. Install JDK 8 Hadoop requires a working Java installation. So, open up a terminal and run the following 1 sudo add-apt-repository ppa:webupd8team/java 2 sudo apt-get update && sudo apt-get install oracle-java8-installer It will take some time to dow...

Ports Used by CloudStack and its entities - Troubleshooting Cloudstack and Xen server issues

CloudStack is an open source cloud computing software for creating, managing, and deploying infrastructure cloud services. It uses existing hypervisors such as KVM, VMware vSphere, and XenServer/XCP for virtualization.  What is Cloudstack and who uses it ? It is a revolutionary application stack that allows us to deploy and manage large networks of virtual machines, as a highly available, highly scalable Infrastructure as a Service (IaaS) cloud computing platform. CloudStack is used by a number of service providers to offer public cloud services, and by many companies to provide an on-premises (private) cloud offering, or as part of a hybrid cloud solution. While working on this fantastic tool, I had encountered many issues & finding out port details and services associated with those ports really helped me a lot to resolve 90% of those issues. I just though of sharing some of those important port numbers for my viewers so that they can get benefited from my experience. ...

IP SLA Tracking configuration with Example (Auto Failover)

One company/branch may have two connections to the internet through 2 ISPs. If we want to use one connection as primary and other as backup, we can use IP SLA to monitor the link. It detects the link failure even if the connection between modem and ISP fails. So it helps to switch over to the backup path automatically as soon as the primary path is down. In case if you want to divert sophisticated traffic like SAP, Mail and some other traffic though ISP 1 and Internet traffic through ISP 2 based on the availability of ISP links, We could use IP SLA with tracking object and PBR. Also other ISP will be acting as backup link for the mentioned traffics using normal routing configurations even if the primary ISP link goes down. For example, I have taken a location which has 2 ISP links.Follow the below steps to achieve our objective. Configuring IP SLA for ISP 1 Step 1: Creating SLA process ip sla 13 icmp-echo 1.1.1.1 source-ip 1.1.1.2 (here we monitor ISP 1 link neighbor’s IP) Note: There ...