Configuration Management Tools
- A DevOps engineer should have an idea of how Big Data projects are implemented, the underlying technology platforms
- A decision to use the right CM tool will have to be made depending on the project requirements
- A DevOps engineer should have some experience working on the chosen CM tool
What is Chef?
Chef is an open source configuration management and infrastructure automation platform. It gives you a way to automate your infrastructure and processes. It helps in managing your IT infrastructure and applications as code. Since your infrastructure is managed with code, it can be automated, tested and reproduced with ease.
More about Chef: https://docs.getchef.com/chef_overview.html
Chef Architecture in a Nutshell
Chef typically runs in a client-server mode. The Chef server can be of 2 types: Hosted Chef, which is a SaaS offering; Private Chef is an organization specific Chef server. Private Chef could be Open Source or Licensed.
The chef client is the VM or machine that you want to manage/automate. This is called as the chef node. This is based on a “pull” mechanism where the chef node requests for any updates from the Chef server.
Chef can also run using the standalone mode called as chef solo or chef zero. This mode is typically used for development/testing.
The configuration management is done using Chef Cookbooks. Cookbooks contain recipes which are added to the chef node. These recipes define the behavior of the node. Eg: which node will run the Apache webserver, which will have a DB server and so on.
There are various other configurations supported by Chef: roles, environments, data bags etc.
More can be learnt at: https://www.getchef.com/chef/
Using Chef to Deploy Hadoop, Hive, Pig, HBase
A Chef cookbook is available which can install and configure hadoop, HBase, Hive, Pig and other Hadoop jobs.
The cookbook available: https://supermarket.getchef.com/cookbooks/hadoop/versions/1.0.4 would have to be configured as per project requirements. Most likely few changes would have to be made to the cookbook so that it can fit the existing project design.
Using Chef to setup Azkaban Job Scheduler
A couple of cookbooks are available using which Azkaban can be setup and configured.
These cookbooks can be extended to fit the project’s requirements.
Puppet – http://puppetlabs.com/
Orchestrating HBase cluster deployment using Chef – http://www.slideshare.net/rberger/orchestrating-hbase-cluster-deployment-with-ironfan-and-chef
Talk by John Martin about building and managing Hadoop cluster with Chef – https://www.getchef.com/blog/chefconf-talks/building-and-managing-hadoop-with-chef-john-martin/