Configuring Hadoop Cluster using Ansible Playbook

4 min readApr 20, 2021

What is Ansible?

Ansible is an open-source software provisioning, configuration management, and application-deployment tool enabling infrastructure as code. It runs on many Unix-like systems and can configure both Unix-like systems as well as Microsoft Windows. It includes its own declarative language to describe system configuration. Ansible was written by Michael DeHaan and acquired by Red Hat in 2015. Ansible is agentless, temporarily connecting remotely via SSH or Windows Remote Management (allowing remote PowerShell execution) to do its tasks.

What is Hadoop?

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

So let's start configuring

Pre-requisite

Ansible Configured Manager Node
2 X DataNodes
1 X NameNode

STEP 1: Edit your Ansible inventory to add IP address of the NameNode and DataNode it should look like this

STEP 2: Create an Ansible role NameNode inside a folder HadoopRole

ansible-galaxy init HadoopRole

STEP 3: Edit the main.yml inside the task folder of the NameNode role.

STEP 4: Now in the Template folder in the NameNode create 2 “.xml” with name “core-site.xml” and “hdfs-site.xml” and copy the content as in image below

core-site.xml

2. hdfs-site.xml

STEP 5: Create an Ansible Role in the HadoopRole with the name DataNode using the command

ansible-galaxy init DataNode

STEP 6: Edit the main.yml inside the task folder of the DataNode role

STEP 7: Now in the Template folder in the DataNode create 2 “.xml” with name “core-site.xml” and “hdfs-site.xml” and copy the content as in image below