CI/CD Pipeline with Github-SBT-TravisCI-Codacy-Nexus/Artifactory

This is a “HelloWorld CI Pipeline”!

This blog describes the steps taken to establish a CI / CD pipeline using SBT, Travis CI, Codacy and Nexus or Artifactory.

Github Private Repository

I created a HellowWorld Scala project in a github private repository. (You can create a public repo)

It is recommended that you set your github account for Two Factor Authentication.

SBT

For those who are new to Scala and SBT, here is some information. You need to create a build.sbt. Similar to ant, maven or any other build tool you can have “tasks” that sbt will perform. Eg – clean, compile, test, publish.

Publish is what we want to do. Publish where? If we want to publish the build to the Nexus Repository, this is how the build.sbt will look like:

name := "hello-scala"
version := "1.4"
organization := "fakepath.test"
scalaVersion := "2.11.1"
libraryDependencies += "org.scalatest" %% "scalatest" % "2.1.6" % "test"
publishTo := {
val nexus = "https://" + System.getenv("NEXUS_IP") + ":8081/nexus/"
if (isSnapshot.value)
Some("snapshots" at nexus + "content/repositories/snapshots")
  else
Some("releases"  at nexus + "content/repositories/releases")

System.getenv("NEXUS_USER"), System.getenv("NEXUS_PASSWORD"))

Note the publishTo section. The nexus IP, User and password are not pushed to the repo and are taken from the environment variables.

Travis CI

I have used the hosted Travis CI. For private repositories Travis CI has a SaaS offering at – https://magnum.travis-ci.com.

Travis docs suggest that they ensure our builds and artifacts from the private repo are secure and the space is not shared by any other application. Their security statement: https://billing.travis-ci.com/pages/security

Signup up with Travis CI using your github account. To start with, a webhook needs to be activated for your private repo, manually, as a one time task. Then a .travis.yml file needs to be created. There are lot of things you can do by scripting the .travis.yml file properly.

The .travis.yml would be added to your repo. Our file looks as below:

language: scala
jdk: oraclejdk7
sudo: false
scala:
- 2.11.1
before_install: umask 0022
script:
- sbt clean compile test
- sbt publish
notifications:
  slack:
    secure:
    rooms:
      secure:
email:
env:
  global:
  - secure:
  - secure:
  - secure:

The environment variables are added to this file by encrypting them. The enviroment variables used in this case are: NEXUS_IP, NEXUS_USER, NEXUS_PASSWORD. Notifications can also be set. The notifications set in this case are slack notifications to a single person or to a channel.

You can encrypt this info as:

travis encrypt NEXUS_USER=someuser --add

Note – that we run the “sbt publish” from Travis CI. This will generate an artifact. For a scala project, the artifact is a jar file. The artifact can also be a docker image. When “sbt publish” is run by Travis CI, it will use the build.sbt’s publishTo description to publish the artifact to the Nexus repository.

Note – We can (and we should) have a task like “sbt promote”, or we can write scripts in Travis CI itself which will “promote” a build. We would want to publish every build to Nexus. But we will not want to “promote” every build. The promoted build is deployed to production. This is typically a manual step with what is known as “one-click” deployment. This can be completely automated too, depending on the project. Infact, thats the difference between “Continuous Deployment ” and “Continuous Delivery”.

Note – We can read the latest version number of the build and then increment the number of the next build. This can be done programmatically by scripts.

Codacy

Codacy is a code review and analysis tool. You can signup with codacy.com using your github account and enabling the webhooks as documented by them.

Codacy also has support for private repository. You can play with the UI to see and set different metrics. However, to automate, you will need some scripts. Eg – if we want to ensure we do not build if code coverage metrics are less than 95% then these settings need to be done in codacy. To enable codacy for a private repo, again there are some web hooks. We can have codacy analyse the code per commit, per PR, per branch etc.

Nexus

I have made use of Nexus to act as an “Artifact Manager”. So Nexus would be the tool which will store all the builds. Each build will be numbered and we will also have a build named “latest” which will be like a pointer to the latest promoted build. When a decision is made to promote a build, Travis CI will publish a build to Nexus & will also have a script that will update the “latest” pointer to point to this build.

Alternatives to Nexus

There are couple of repository management solutions available. Some offer hosted services.

  1. Artifactory http://www.jfrog.com/open-source/#os-arti
  2. Bintray http://www.jfrog.com/open-source/#os-bin

Configuration Management / Deployment

I have written a simple bash script which downloads the artifact from Nexus repo or Artifactory and simply executes it. We can run this script as a cron job.

Here is the script for fetching newly published artifacts from Artifactory repo:

#!/usr/bin

# Artifactory location
server=http://myorg.artifactoryonline.com//myrepo
repo=plugins-releases-local

# Maven artifact location
name=hello-scala_2.11
artifact=hello-scala
path=$server/$repo/$artifact/$name
version=`curl -s $path/maven-metadata.xml | grep latest | sed "s/.*<latest>\([^<]*\)<\/latest>.*/\1/"`
jar=$name-$version.jar
url=$path/$version/$jar

#check if jar file exists
echo $jar
if [ ! -f $jar ]; then
 echo "Downloading new artifact"
 wget -q -N $url
 echo "Executing Jar: " `date` >> hello-scala.log
 scala $jar >> hello-scala.log
fi

DevOps (Part 4): Configuration Management for Big Data Projects

Configuration Management Tools

Popular configuration management tools include Ansible, CFEngine, Chef, Puppet, RANCID, SaltStack, and Ubuntu Juju.

Key Considerations

  • A DevOps engineer should have an idea of how Big Data projects are implemented, the underlying technology platforms
  • A decision to use the right CM tool will have to be made depending on the project requirements
  • A DevOps engineer should have some experience working on the chosen CM tool

Eg: Chef

What is Chef?

Chef is an open source configuration management and infrastructure automation platform. It gives you a way to automate your infrastructure and processes. It helps in managing your IT infrastructure and applications as code. Since your infrastructure is managed with code, it can be automated, tested and reproduced with ease.

More about Chef: https://docs.getchef.com/chef_overview.html

Chef Architecture in a Nutshell

Chef typically runs in a client-server mode. The Chef server can be of 2 types: Hosted Chef, which is a SaaS offering; Private Chef is an organization specific Chef server. Private Chef could be Open Source or Licensed.

The chef client is the VM or machine that you want to manage/automate. This is called as the chef node. This is based on a “pull” mechanism where the chef node requests for any updates from the Chef server.

Chef can also run using the standalone mode called as chef solo or chef zero. This mode is typically used for development/testing.

The configuration management is done using Chef Cookbooks. Cookbooks contain recipes which are added to the chef node. These recipes define the behavior of the node. Eg: which node will run the Apache webserver, which will have a DB server and so on.

There are various other configurations supported by Chef: roles, environments, data bags etc.

More can be learnt at: https://www.getchef.com/chef/

Using Chef to Deploy Hadoop, Hive, Pig, HBase

A Chef cookbook is available which can install and configure hadoop, HBase, Hive, Pig and other Hadoop jobs.

The cookbook available: https://supermarket.getchef.com/cookbooks/hadoop/versions/1.0.4 would have to be configured as per project requirements. Most likely few changes would have to be made to the cookbook so that it can fit the existing project design.

Using Chef to setup Azkaban Job Scheduler

A couple of cookbooks are available using which Azkaban can be setup and configured.

https://github.com/yieldbot/chef-azkaban2

https://github.com/RiotGames/azkaban-cookbook

These cookbooks can be extended to fit the project’s requirements.

More reads

Chef – https://www.getchef.com/chef/

Puppet – http://puppetlabs.com/

Orchestrating HBase cluster deployment using Chef –  http://www.slideshare.net/rberger/orchestrating-hbase-cluster-deployment-with-ironfan-and-chef

Talk by John Martin about building and managing Hadoop cluster with Chef – https://www.getchef.com/blog/chefconf-talks/building-and-managing-hadoop-with-chef-john-martin/

DevOps (Part 3) – Continuous Delivery

Continuous Delivery includes automated testing, CI and continuous deployment resulting in the ability to rapidly, reliably and repeatedly pushing out enhancements and bug fixes to customers at low risk and with minimal manual overload.

Continuous Deployment Vs Continuous Delivery

I read a tweet once upon a time which sums up the difference – “Continuous Delivery doesn’t mean every change is deployed to production ASAP. It means every change is proven to be deployable at any time”.

DevOps for Kohls (1)

Deployment

Whether automated or with a manual “click”, the deployment too can be automated. The deployment varies largely depending on the project. We now step into what is called as Configuration Management.

DevOps (Part 2): Continuous Integration

Code Review, Build and Test can be automated to achieve Continuous Integration.

Code Review Tools

Every project’s repository is usually managed by some versioning tool. A choice for the versioning tool can be made, we can assume Git for now since its most popular. When the developer pushes his change, a build would be triggered. If the build is successful, then test job would be triggered. Its only after the tests pass, that this commit should be merged to the central repository. Typically the developer’s commit would also need manual review.

As a design principle for DevOps, the central repo should not have push/write access. He will commit to an “authoritative repository”.

Key Considerations

  • A decision from DevOps perspective must be made to choose the right versioning tool, managing of user’s commits to an authoritative repo and a code review tool. This will depend on the project’s requirement. Popular tools to be evaluated are Git, Gerrit, Teamcity.
  • A DevOps engineer will have to setup and configure the tools. Eg – Git-Gerrit integration needs to be installed, setup, configured

Eg: Gerrit

Gerrit is a web-based code review tool built on top of the git version control system. It is intended to provide a lightweight framework for reviewing every commit before it is accepted into the code base. Changes are uploaded to Gerrit but don’t actually become a part of the project until they’ve been reviewed and accepted. In many ways this is simply tooling to support the standard open source process of submitting patches which are then reviewed by the project members before being applied to the code base. However Gerrit goes a step further making it simple for all committers on a project to ensure that changes are checked over before they’re actually applied.

Gerrit can be integrated with several Build Automation tools like Jenkins. It can also be integrated with Issue Tracking systems like RedMine. Eg: When a user commits his change for a bug #123 in RedMine, the bug in RedMine will get updated.

More Reads

What is Gerrit? https://review.openstack.org/Documentation/intro-quick.html

Git-Gerrit configuration: http://www.vogella.com/tutorials/Gerrit/article.html

Implementing Gitflow with TeamForge and Gerrit – http://blogs.collab.net/teamforge/implementing-gitflow-with-teamforge-and-gerrit-part-i-configuration-and-usage

Build Automation

Generically Build Automation is referred to as writing scripts to automate tasks like compiling, packaging, running automated tests, deploying to production and creating documentation. This section however talks about simply building your code.

Key Considerations

  • Most projects already have build tools for them. Ant, Maven, Gradle.
  • There might be a need for distributed builds. A build automation tool must be able to manage these dependencies in order to perform distributed builds.
  • A DevOps engineer may have to write configuration scripts to build artifacts

Eg: Gradle

Gradle can be integrated with Github. What we can achieve by this is GitHub would recognize Gradle build scripts and provide nice syntax highlighting.

Gradle can be integrated with any CI server. There is a good Jenkins plugin for Gradle. It can be integrated with TeamCity – an extensible build server. So essentially what we achieve is – “a user’s commit triggers a job in Jenkins, which uses Gradle to build the repository”.

Gradle can be integrated with some Repository Managers like Nexus. So if gradle builds an artifact successfully, it will have to be transferred to some remote location, artifacts of older builds need to be maintained, maintain common binaries across different environments and provide secure access to the artifacts. This is the role of a Repository Manager.

More Reads

Integrating Gradle with Jenkins – https://wiki.jenkins-ci.org/display/JENKINS/Gradle+Plugin

Integrating Gradle with TeamCity – http://confluence.jetbrains.com/display/TCD8/Gradle

What is Nexus? http://www.sonatype.com/nexus/why-nexus/why-use-a-repo-manager

What is a Repository Manager – http://blog.sonatype.com/2009/04/what-is-a-repository/#.VFY5s_mUd8E

Test Automation

When user commits code and its successfully built and deployed to a test environment, the actual test jobs need to be started on that environment. The test jobs include unit tests as well as integration tests. The testing would most likely involve creating test VMs and cleaning them up after every tests. The test results would have to be relayed back to the developers and others at stake.

Key Considerations

  • From DevOps perspective we dont have a “test automation tool”. What we have is an automation framework, which will involve test automation. Hence this is one of the most important aspects of deciding on a DevOps automation tool.
  • There are several CI servers, most popular being Jenkins. Travis and BuildHive are hosted services offering some additional options. The choice of a CI server will have to be made depending on several factors.
  • The frequency of commits need to be estimated. Will you run tests after each commit?
  • There are some tests that would run nightly
  • A DevOps engineer will have to write configuration scripts which will trigger test jobs, create VMs, give back feedback, etc.

Eg: CI Server – Jenkins

Jenkins can be configured to trigger jobs that run tests. It can spawn VMs, clusters where the tests run. Depending on the tests, data volumes, you may have to consider using open source Jenkins or the Enterprise version.

Jenkins can be integrated with Git/Gerrit. So every push can trigger a build & test job.

Jenkins can be integrated with Code Analysis tools like Checkmarx.

Jenkins can be integrated with Repository Managers like Nexus.

Jenkins can be integrated with Issue Tracking tools like RedMine.

More Reads

DevOps and Test Automation – http://www.scriptrock.com/blog/devops-test-automation

Case Study: Deutsche Telekom with Jenkins & Chef – http://www.ravellosystems.com/customer-case-studies/deutsche-telekom

Git, Gerrit Review, Jenkins Setup – http://www.infoq.com/articles/Gerrit-jenkins-hudson

Gerrit Jenkins Git – https://wiki.openstack.org/wiki/GerritJenkinsGit

Checkmarx – https://www.checkmarx.com/glossary/code-analysis-tools-boosting-application-security/

DevOps (Part 1): Introduction to DevOps

What Is DevOps?

The answer to this question can be given philosophically and technically. It is important to know the philosophy behind DevOps. But you will find this explanation in plenty of sites so I will skip this part.

Today almost everything is getting “automated”. Repetitive tasks are replaced with machines / code. Methods are being devised to address and minimise the defects and bugs in any system. Methods to minimise human error are being devised. The issues in the software development cycle are being addressed. One major issue in the SDLC was the process of how the developed code moved to production. DevOps addresses these issues. DevOps is an umbrella concept which deals with anything to smooth out the process from development to deployment into production.

DevOps Objectives

  • Introduce Code Review System
  • Automate Build
  • Automate Testing
  • Automate Deployment
  • Automate Monitoring
  • Automate Issue Tracking
  • Automate Feedbacks

These objects can be achieved by setting up a Continuous Integration pipeline and a Continuous Deployment/Delivery process. Post delivery, a process for Continuous Monitoring is set up.

Points to be considered while setting up a CI/CD pipeline

  • Developers push lot of code (many commits)
  • With every commit a build would be triggered
  • Automated tests should be run in production-clone environment
  • Builds should be fast
  • Generate feedbacks (to developers, to testers, to admins, to open source community, etc)
  • Publish latest distributable
  • Distributable should be deployable on various environments
  • Deployment process should be reliable and secure
  • One click deploys

A Simplified CI/CD Pipeline

DevOps for Kohls