DevOps (Part 4): Configuration Management for Big Data Projects

Configuration Management Tools

Popular configuration management tools include Ansible, CFEngine, Chef, Puppet, RANCID, SaltStack, and Ubuntu Juju.

Key Considerations

  • A DevOps engineer should have an idea of how Big Data projects are implemented, the underlying technology platforms
  • A decision to use the right CM tool will have to be made depending on the project requirements
  • A DevOps engineer should have some experience working on the chosen CM tool

Eg: Chef

What is Chef?

Chef is an open source configuration management and infrastructure automation platform. It gives you a way to automate your infrastructure and processes. It helps in managing your IT infrastructure and applications as code. Since your infrastructure is managed with code, it can be automated, tested and reproduced with ease.

More about Chef: https://docs.getchef.com/chef_overview.html

Chef Architecture in a Nutshell

Chef typically runs in a client-server mode. The Chef server can be of 2 types: Hosted Chef, which is a SaaS offering; Private Chef is an organization specific Chef server. Private Chef could be Open Source or Licensed.

The chef client is the VM or machine that you want to manage/automate. This is called as the chef node. This is based on a “pull” mechanism where the chef node requests for any updates from the Chef server.

Chef can also run using the standalone mode called as chef solo or chef zero. This mode is typically used for development/testing.

The configuration management is done using Chef Cookbooks. Cookbooks contain recipes which are added to the chef node. These recipes define the behavior of the node. Eg: which node will run the Apache webserver, which will have a DB server and so on.

There are various other configurations supported by Chef: roles, environments, data bags etc.

More can be learnt at: https://www.getchef.com/chef/

Using Chef to Deploy Hadoop, Hive, Pig, HBase

A Chef cookbook is available which can install and configure hadoop, HBase, Hive, Pig and other Hadoop jobs.

The cookbook available: https://supermarket.getchef.com/cookbooks/hadoop/versions/1.0.4 would have to be configured as per project requirements. Most likely few changes would have to be made to the cookbook so that it can fit the existing project design.

Using Chef to setup Azkaban Job Scheduler

A couple of cookbooks are available using which Azkaban can be setup and configured.

https://github.com/yieldbot/chef-azkaban2

https://github.com/RiotGames/azkaban-cookbook

These cookbooks can be extended to fit the project’s requirements.

More reads

Chef – https://www.getchef.com/chef/

Puppet – http://puppetlabs.com/

Orchestrating HBase cluster deployment using Chef –  http://www.slideshare.net/rberger/orchestrating-hbase-cluster-deployment-with-ironfan-and-chef

Talk by John Martin about building and managing Hadoop cluster with Chef – https://www.getchef.com/blog/chefconf-talks/building-and-managing-hadoop-with-chef-john-martin/

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s