Configuring Hadoop Cluster using Ansible Playbook

What is Ansible?

Ansible is an open-source software provisioning, configuration management, and application-deployment tool enabling infrastructure as code. It runs on many Unix-like systems and can configure both Unix-like systems as well as Microsoft Windows. It includes its own declarative language to describe system configuration. Ansible was written by Michael DeHaan and acquired by Red Hat in 2015. Ansible is agentless, temporarily connecting remotely via SSH or Windows Remote Management (allowing remote PowerShell execution) to do its tasks.

What is Hadoop?

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

So let's start configuring


  1. Ansible Configured Manager Node
  2. 2 X DataNodes
  3. 1 X NameNode

STEP 1: Edit your Ansible inventory to add IP address of the NameNode and DataNode it should look like this

STEP 2: Create an Ansible role NameNode inside a folder HadoopRole

ansible-galaxy init HadoopRole

STEP 3: Edit the main.yml inside the task folder of the NameNode role.

STEP 4: Now in the Template folder in the NameNode create 2 “.xml” with name “core-site.xml” and “hdfs-site.xml” and copy the content as in image below

  1. core-site.xml

2. hdfs-site.xml

STEP 5: Create an Ansible Role in the HadoopRole with the name DataNode using the command

ansible-galaxy init DataNode

STEP 6: Edit the main.yml inside the task folder of the DataNode role

STEP 7: Now in the Template folder in the DataNode create 2 “.xml” with name “core-site.xml” and “hdfs-site.xml” and copy the content as in image below

  1. core-site.xml

Here we are using jinja template to replace the namenode_ip variable with our variable which we will create in the next step.


STEP 8: Edit the mail.yml in the vars folder inside DataNode

STEP 9: Create a PlayBook as given below

STEP 10: Now run the playbook using

ansible-playbook <playbookname.yml>

STEP 11: Done check the Hadoop configuration and DataNode by running the command.

ansible NameNode -m shell -a ‘hadoop dfsadmin -report’





Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

3 Open-Source web development platforms

web development

Cache Repetitive Ecto Queries

Dear client, here’s why building good tech is hard.

Texas Holdem Poker Odds Table


Antmons Weekly Update

Read in config file and ENV variables if set in Go — Golang

Setting Up Your First Rails Project

Jenkins and its Industrial Use Cases

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Suman Sourav

Suman Sourav

More from Medium

Docker Persistent Data Simplified

Use Ansible to customize AWS EMR through bootstrap actions

Apache Hadoop HDFS Cluster Automation Using Ansible

How to set up a CI/CD pipeline for ROS robots using Airbotics