Install Hadoop On Windows Without Cygwin Commands

4/30/2020

Apr 08, 2013 Running Hadoop on cygwin in windows (Single-Node Cluster) In this document you are going to see how you can setup pseudo-distributed, single-node Hadoop (any stable version 1.0.X) cluster backed by the Hadoop Distributed File System, running on windows ( I am using Windows VISTA). Cygwin command shell prompt in windows system without the need to understand or learn the new commands. Cygwin is a POSIX (Portable Operating System Interface for UNIX) 5. Cygwin is able to work with any of 32 bit or 64 bit system. Cygwin packages are selected at the installation as per the requirement.

Install CygwinCygwin comes with a normal setup.exe to install in Windows, but there are a couple steps you will need to pay attention to, so we will walk you through the installation.To keep the installation small while saving bandwidth for you and Cygwin, the default installer will download only the files you need from the internet.The default install path is C:Cygwin but if you don’t like to have programs installed on the root of your C: drive you can change the path or.Click next until you come to a download mirror selection. Unfortunately, the installer does not say where the mirrors are located so in most cases you might as well just guess which mirror works best.After you have selected a mirror, the installer will download a list of available packages for you to install.

Here is where things get a bit more intimidating.There will be hundreds of packages available separated by multiple different categories. If you don’t know what the package is you can leave the default selection and install additional packages later by running the installer again. If you know what package you need, you can search for it and the results will be automatically filtered.Once you click next, it will take a little while to download all the selected tools and then finish the installation. Add Cygwin Path to Windows Environment VariableAfter the installation you will have a Cygwin icon on your desktop that you can launch to open the Cygwin terminal.This terminal starts in the C:Cygwinhome folder but that isn’t particularly useful because you probably don’t have any files stored there.

In this tutorial, we will take you through step by step process to install Apache Hadoop on a Linux box (Ubuntu). This is 2 part process

There are 2 Prerequisites

You must have Ubuntu installed and running
You must have Java Installed.

Part 1) Download and Install Hadoop

Step 1) Add a Hadoop system user using below command

Enter your password, name and other details.

NOTE: There is a possibility of below-mentioned error in this setup and installation process.

'hduser is not in the sudoers file. This incident will be reported.'

This error can be resolved by Login as a root user

Execute the command

Step 2) Configure SSH

In order to manage nodes in a cluster, Hadoop requires SSH access

First, switch user, enter the following command

This command will create a new key.

Enable SSH access to local machine using this key.

Now test SSH setup by connecting to localhost as 'hduser' user.

Note: Please note, if you see below error in response to 'ssh localhost', then there is a possibility that SSH is not available on this system-

To resolve this -

Purge SSH using,

It is good practice to purge before the start of installation

Install SSH using the command-

Step 3) Next step is to Download Hadoop

Select Stable

Select the tar.gz file ( not the file with src)

Once a download is complete, navigate to the directory containing the tar file

Enter,

Now, rename hadoop-2.2.0 as hadoop

Part 2) Configure Hadoop

Step 1) Modify ~/.bashrc file

Add following lines to end of file ~/.bashrc

Now, source this environment configuration using below command

Step 2) Configurations related to HDFS

Set JAVA_HOME inside file $HADOOP_HOME/etc/hadoop/hadoop-env.sh

With

There are two parameters in $HADOOP_HOME/etc/hadoop/core-site.xml which need to be set-

1.'hadoop.tmp.dir' - Used to specify a directory which will be used by Hadoop to store its data files.

2. 'fs.default.name' - This specifies the default file system.

To set these parameters, open core-site.xml

Copy below line in between tags <configuration></configuration>

Navigate to the directory $HADOOP_HOME/etc/Hadoop

Now, create the directory mentioned in core-site.xml

Grant permissions to the directory

Step 3) Map Reduce Configuration

Before you begin with these configurations, lets set HADOOP_HOME path

And Enter

Next enter

Exit the Terminal and restart again

Type echo $HADOOP_HOME. To verify the path

Now copy files

Open the mapred-site.xml file

Add below lines of setting in between tags <configuration> and </configuration>

Open $HADOOP_HOME/etc/hadoop/hdfs-site.xml as below,

Add below lines of setting between tags <configuration> and </configuration>

Create a directory specified in above setting-

Step 4) Before we start Hadoop for the first time, format HDFS using below command

Step 5) Start Hadoop single node cluster using below command

An output of above command

Using 'jps' tool/command, verify whether all the Hadoop related processes are running or not.

If Hadoop has started successfully then an output of jps should show NameNode, NodeManager, ResourceManager, SecondaryNameNode, DataNode.

Step 6) Stopping Hadoop