George Hadjiyiannis

George Hadjiyiannis

Software Executive, Entrepreneur, Software Architect

CI/CD Pipeline Design: Putting it all together

How the full set of pipelines is composed from the basic building blocks, and some further optimizations.

George Hadjiyiannis

20 minutes read

Pipeline Structure

This is the last part in our series on designing CI/CD pipelines. The first part of the series defined the objectives of the pipelines, and defined the stages that go with them. The second part covered in detail the core Build-Deploy-Test scripts, which together effectively implement all the necessary functionality. In this third and final part, we will cover how we put together all of the stages described in Part 1, by composing them using the blocks described in Part 2. We will also cover a number of optimizations that can be made. If you have not read the first two parts, I recommend you do so now, as this part relies heavily on the concepts presented in those articles.

Let's start with a short refresher. The stages are in most ways identical, differing primarily in three key variables:

  1. How/when the stage is triggered
  2. Which version of the software it builds and tests
  3. Which tests and checks it runs

The table below shows the summary for all stages described in Part 1:

Stage Version Trigger Tests / Checks
Local Local changes Manual style checks, static code analysis, unit tests, smoke tests, specific tests by developer
Commit Version in Commit Commit style checks, static code analysis, unit tests, smoke tests
Nightly Snapshot at trigger time External timer style checks, static code analysis, unit tests, expensive code analysis, full regression
RC Release candidate Manual by the PO style checks, static code analysis, unit tests, expensive code analysis, full regression
PLS Release candidate Manual by the PO Performance, Load, and Stress tests
SEC Release candidate Manual by the PO Security tests

The key functionality is implemented by three main scripts, as described in Part 2:

  • The Build script takes as parameters the tag, representing a version of the software in the source code repository, and the label, representing the artifacts in the artifact repo. It clones the corresponding code from the source code repository, builds it into deployable artifacts (e.g., RPMs, docker images) and stores them in the artifact repo under the label. It also runs most of the local static analysis, plus the unit tests.
  • The Deploy script takes as parameters the label, and the name of an environment. It spins up a raw copy of the environment (containing the OS, but none of the application artifacts), deploys the application artifacts corresponding to the label, and initializes the application. In our case, the tests are packaged as their own artifact under the same label, and can be run on the same environment (i.e., the test code is deployed on the environment along with the application artifacts).
  • The Test script takes as parameters the location of an environment, and the name of a test suite, and executes the corresponding tests against the environment. Once the tests are complete, it persists the results as well as the application logs, saves a snapshot, and spins the environment down.

Putting it all together

The following sections describe how each of the stages is put together.

Local

The Local stage in in a lot of respects the easiest stage to build up, despite the lack of CI/CD infrastructure. The “pipeline” is triggered manually by invoking the wrapper script, either from the command line, or by configuring the IDE in the right way. Written as pseudo-code, the Local “pipeline” looks like this:

# get the basic parameters for the stage
$TAG="local"
$LABEL="local" + getUUID()
$TARGET_ENV="local_VM"

build $TAG $LABEL             # builds the local version of the code
deploy $LABEL $TARGET_ENV     # deploy to local VM
test $TARGET_ENV smoke_tests --no-spin-down

The tag that needs to be passed to the Build script is simply hardwired to the special tag “local”, which instructs the Build script to use the code in the machine's default workspace as is. Note that this allows Local the ability to build code that is not in the repository yet - this is essential to satisfying the requirement that a number of tests and checks are run before the changes are committed. Local is the only stage that can test changes before they make it into the code repository. The label is similarly trivial and is constructed as a UUID. The environment is also trivial, and is hardwired to a local VM, resident on the developer's laptop. Finally, the test suite is set to run the smoke tests, but with the caveat that the developer is expected to run additional tests specific to the modifications (s)he made. This is why the Test script is given the option --no-spin-down, which causes the environment to be left running after the tests are completed. Note that the Build script will also run all checks that do not take too long, as well as the appropriate unit tests.

Commit

The Commit stage is in some respects the most complicated. The pseudo-code looks like:

# get the parameters for the stage
$TAG="commit" + getCommitSHA()
# note that this tag needs to be created
tag getCommitSHA() $TAG             # Create the tag for the commit         
$LABEL="commit" + getCommitSHA()
$TARGET_ENV="commit_VM"

build $TAG $LABEL             # builds the version corresponding to the commit SHA
deploy $LABEL $TARGET_ENV     # deploy to Commit environment
test $TARGET_ENV smoke_tests  # smoke tests are enough

The Commit pipeline is triggered externally by an event generated by the code repository when a commit happens. The tag is non-trivial, and must be created automatically. The code first determines the unique identifier of the commit (in git, this would be the SHA corresponding to the commit), and then uses this to construct a tag identifier. It then creates the corresponding tag on the right version of the code. This allows the Build script to get the right version of the code once it starts executing. The label is similarly constructed. The environment is hardwired to an environment reserved for the Commit stage, and the test suite is once again smoke tests. Note that, as is the case in the Local stage, the Build script will run all static analysis checks that do not take too long, as well as all the appropriate unit tests. Unlike the case of the Local stage, however, the environment will be spun down once the tests complete.

Nightly

The Nightly stage is similar to the Commit stage. The pseudo-code looks like:

# get the basic parameters for the stage
$TAG="nightly" + getDateDDMMYYYY()
# Note that this tag needs to be created
tag HEAD $TAG
$LABEL="nightly" + getDateDDMMYYYYY()
$TARGET_ENV="nightly_VM"

build $TAG $LABEL --run-slow-checks    # include all checks this time
deploy $LABEL $TARGET_ENV              # deploy to Nightly environment
test $TARGET_ENV regression            # run all tests

The Nightly stage is triggered by a timer in the CI/CD infrastructure. As in the case of the Commit pipeline, the tag is constructed automatically, and created when the pipeline runs. In the case of Nightly, however, it corresponds to the latest code available in the repository when the pipeline runs (i.e., the HEAD of the corresponding branch). Also note that this time, the slow checks are included in the build step, since the goal of Nightly is to prevent the code degradation that can happen when some checks and tests are skipped. The label is constructed in a similar fashion to the tag. As before, the environment is hardwired to an environment dedicated to Nightly. This pipeline runs full regression testing, and also spins the environment down once the tests complete.

RC

The RC stage is almost identical to the Nightly stage, except for the code repository tag. The pseudo-code looks like:

# get the basic parameters for the stage
$TAG=$ARGV[1]             # get the tag externally - it is already created
$LABEL="rc_" + $TAG
$TARGET_ENV="rc_VM"

build $TAG $LABEL --run-slow-checks    # include all checks this time
deploy $LABEL $TARGET_ENV              # deploy to RC environment
test $TARGET_ENV regression            # run all tests

The RC pipeline is invoked manually using the UI of the CI/CD infrastructure. Before the pipeline is invoked, the PO will have arranged for the right code changes to be cherry-picked into a short-lived release branch, and would have tagged the right version of the code manually. The corresponding tag would then be passed in through the UI when the pipeline is invoked. Thus the tag for the RC stage is obtained externally. The rest of the process is identical to that of Nightly (except that the target environment is now a VM dedicated to RC testing). In particular, all code checks and full regression tests are run.

PLS

The PLS stage is a bit different from the previous stages; in pseudo-code it looks like:

# get the basic parameters for the stage
$TAG=$ARGV[1]             # get the tag externally - same as RC
$LABEL="rc_" + $TAG
$TARGET_ENV="pls_VM"

# no build step
deploy $LABEL $TARGET_ENV              # deploy to PLS environment
test $TARGET_ENV pls                   # run performance, load, and stress tests

The pipeline can be invoked either manually using the same method as RC, or automatically once the build step of RC has finished. Either way, the most noteworthy difference to RC is that the PLS stage has no need for its own build step, since we only run PLS tests on release candidates, and the artifacts will have already been built by the RC pipeline. For the same reason, the PLS stage does not strictly speaking need the tag, but it is easier to take the tag as input and construct from it the right label, rather than ask the user to make sure (s)he passes in the right label. Also, the PLS stage runs the PLS tests instead of regression tests.

SEC

The SEC stage is practically identical to the PLS stage; the pseudocode is as follows:

# get the basic parameters for the stage
$TAG=$ARGV[1]             # get the tag externally - same as RC
$LABEL="rc_" + $TAG
$TARGET_ENV="sec_VM"

# no build step
deploy $LABEL $TARGET_ENV              # deploy to SEC environment
test $TARGET_ENV security              # run security tests

The SEC stage is invoked the same way as the PLS stage, and operates in an almost identical fashion, except that it runs against an environment dedicated to SEC, and runs the security tests instead of the PLS tests.

Production

While we have not raised the possibility before, it is possible to deploy to production using exactly the same structure. In fact, this is the ideal scenario, as it ensures there are no differences between the test environments and the production environment. The pseudo-code for the production pipeline looks like:

# get the basic parameters for the stage
$TAG=$ARGV[1]             # get the tag externally - same as RC
$LABEL="rc_" + $TAG
$TARGET_ENV="production"

# no build step
deploy $LABEL $TARGET_ENV                    # deploy to production environment
test $TARGET_ENV smoke_tests --no-spin-down  # make sure release succeeded

Once again there is no need for a build step - we just use the artifacts created by the corresponding RC stage run. After deployment is complete, we run the smoke tests to make sure the release succeeded. If so, the environment we released to is ready to take traffic. In the context of Blue-Green deployment, a successful smoke test would cause the switch to the new environment to happen, either automatically or manually. Note that this means the development team or their PO can cause a release even if they have no access to the production environments (for data protection and security reasons). This little detail means that development can own their release, making it possible to implement true DevOps (more on the topic of true DevOps in a future post).

Optimizations

The pipelines as described so far will work correctly, but they are inefficient in their use of resources. The following are a set of optimizations that can be used to improve overall speed and resource utilization.

Skipping modules that have no modified code

Our application can be separated into number of different modules that are packaged independently. For example, our legacy monolith forms one module and is packaged in a single RPM, alongside the various microservices which are each a separate module packaged in a separate RPM. For most of the pipeline runs, especially on Local and Commit, there will only be changes to a single module, and occasionally to two. However, the Build script as currently described, will build all modules from scratch. While this will still produce the right outcome, it makes the builds take significantly longer than they need to. This is especially problematic on Local, where a developer might want to try a quick change in minutes, but a full re-build will take the better part of an hour.

As an optimization, the build script can check for each module if a new build is needed, and if not, skip building the corresponding module. The Build script effectively iterates over each module and translates the corresponding source code into an artifact in the artifact repo. We said earlier that the artifact repo represents a narrow interface between Build and Deploy, and that as long as each label contains an artifact (of the right version) for each module, it is not important how that artifact was built. To check if we need to rebuild a module we could:

  • check if there is a pre-existing tag in the source code repository that, for the particular module, happens to overlap. We say that a tag overlaps another for a module, if for every source file in the module, the two tags contain exactly the same version of the file.
  • If we find such an overlapping tag, then there is no need to re-build the module.
  • we locate the label that corresponds to the overlapping tag, and locate the corresponding version of the module artifact under that label. We then re-label the artifact with the label for our current run, and move on to the next module. Since the new label contains an artifact for the module with the correct version of the module software, the deployment will work correctly.

Ideally, the artifact repository allows multiple tags on the same artifact. If so, we simply add the new label along-side the old one to the same copy of the artifact, thus avoiding wasting space on a copy. If not, we can have the same effect by creating a copy of the artifact and labeling it with the new label, while leaving the old copy as is. The main reason to waste space on a copy instead of simply overwriting the old label, is that we might still want to use the old label either in a different pipeline, or to redo an old run. Fortunately in our case, our artifact repo supports multiple labels per artifact.

One slight complication worth noting: when we skip the actual build step, we also skip the checks and tests performed during the build step. In general that should be OK since all the checks and tests we run are valid when run on just the files in the module. This means that the results of the checks and tests from that previous overlapping tag are also valid for the new tag. If that property doesn't hold, then skipping the build means that we skip tests that need to be re-run. Fortunately, we have no such cases in our system. If you do, you should consider adding another stage “Integration” and running the tests there (along with other integration tests).

Arbitrary environments

Most CI/CD systems assume that only one instance of each pipeline can run at any point in time because of possible conflicts on resources (e.g. the build workspace, or the target environment). For example, if a second commit comes in while the previous one is still working its way through the Commit stage, the CI/CD infrastructure will typically queue the second commit while the first one is being processed. In our case, since we use the same Build, Deploy, and Test scripts for all stages, we had to make sure they cannot conflict, except in the case of target environment which is anyway different for each stage. While the queuing process generally works well, it does mean that we occasionally have to wait for pipelines to complete, especially on the Commit and RC stages. The RC conflicts in particular tend to be painful, since they usually occur when we have to test a hot-fix soon after starting the test for the release candidate at the end of a sprint. Hot-fixes generally imply fixes for critical bugs, which have a certain urgency to them. Often this means that the hot-fix has to take precedence over the sprint release candidate, causing us to interrupt the sprint release candidate run.

Given that the only conflicting resource at that point is the target environment, we can make a simple change to get around this issue: instead of dedicated target environments, we can provision them on command when the deploy script runs. Note that the deploy script already spins up the environment, and that it uses the Immutable Server pattern (as explain in Part 2). This means that we can spin an arbitrary number of environments to use with each of the multiple instances of a given stage. In the hot-fix scenario described above, this would proceed roughly as follows:

  • The PO starts the RC pipeline for the sprint release candidate (with tag and label “Sprint12_RC1” for example). After the Build script completes, the Deploy script provisions a raw OS machine on environment “VM_001”, and deploys the artifacts for label “Sprint12_RC1”).
  • In the meantime, the PO starts another instance of the RC pipeline for the hot-fix (with tag and label “hotfix1234_RC1”). The deploy script provisions a raw OS machine on environment “VM_002” and deploys the artifacts for label “hotfix1234_RC1”.
  • By now the first instance of the pipeline has finished deploying and calls the Test script with the target environment set to “VM_001”, and the test suite to “regression”.
  • Similarly, soon afterwards, the second instance of the pipeline invokes the Test script with target environment “VM_002” and test suite “regression”.

In this fashion, the two instances of the RC pipeline can proceed in parallel without interfering with each other, and without having to queue. The necessary modifications are surprisingly minor. The main change is that the deploy script has to provision an environment, or obtain one from a pool of pre-provisioned environments, instead of having the environment hard-wired. The Test script already takes in the target environment as a parameter, and spins it down after it is done. It should be obvious that this technique is applicable to all stages other than Local (where it would be difficult to run multiple VMs under a laptop).

Repository for local builds

Our expectation is that developers will generally run the Local stage very frequently, incrementally making changes that they have not committed yet. This means that a large number of artifacts are generated as a result of Local. While these could be stored in the central artifact repository, they would be wasteful in terms of space, and cause significant clutter in the label namespace in the repo. Furthermore this adds no value, since it is unlikely that someone else would want to deploy these artifacts from the central repository (primarily because they cannot obtain the corresponding source code, as it is not committed yet). For this reason, it makes sense to have a local artifact repository for Local, and to clean it up on a regular basis. All artifacts built as a result of Local would reside in, and be obtained from this local repository.

Note that this optimization interacts with the previous optimization of skipping the build for modules that have not been modified. In particular, the overlapping tags, corresponding labels, and related artifacts are generally only found in the central repository. If both optimizations are implemented, then the Build script must copy to the local artifact repo the overlapping labels and artifacts from the central repo.

Cleaning up

Even with Local using a local artifact repository, old artifacts and labels can quickly accumulate if you have a development organization with significant velocity. Even with disc space being cheap, the clutter (especially in the label namespace) will quickly drive you to think about retention policies and cleaning up. In general, most of the weight and noise will accumulate from the Commit stage. Remember that the purpose of the Commit stage is simply to make it easier for developers to not get in each other's way while working on the same code. This means that it does not make much sense to save a lot of history for Commit artifacts, as you are unlikely to try to reproduce those test runs. In general, it would not make sense to keep Commit artifacts and labels for longer than a sprint. The same applies for Nightly and its artifacts. For RC artifacts, you can keep them for significantly longer, especially if you have significant auditability requirements for anything that found its way to production. When it comes to RC artifacts, I would recommend that you abuse the low cost of disk space to the extend you can.

Note that when you clean up artifacts and labels you need to maintain the artifact repo invariant we expressed earlier: even after deletion, each remaining label should have exactly one artifact of the right version for each module needed by the application. In short, don't delete an artifact while there is still a label that refers to it. This becomes a bit tricky if you have implemented the optimization of skipping the build for modules that have not changed by using multiple labels. Imagine that a module has not changed between labels A and K, and that label A has aged beyond the retention time. Because of the way the optimization above works, the artifact stored in the repo was probably built as part of the run that created label A, and therefore also has an age greater than the retention time, but cannot be deleted because it is still needed by label K. As a result, you cannot delete artifacts just based on their age. My recommendation would be to expire labels instead of artifacts, and delete artifacts only when there is no longer any label referring to them.

Moving to Docker and kubernetes

The original design was created while we were still heavily using our legacy packaging and deployment mechanism: RPMs. However, we were at the same time executing a transition from RPMs and bare metal, to Docker, kubernetes, and Cloud. We therefore made sure that the design would accommodate an easy transition to the new packaging and deployment mechanisms.

Packaging is effectively the last step in assembling the artifacts, after the source code has been “built”. In the RPM domain, this involved installing the various binaries and scripts into the right directories, and adding scripts for setting up the prerequisites to the module (e.g., creating the necessary users and groups, and setting permissions). In the Docker domain, there is no installation necessary. The binaries and scripts are added to the overlay filesystem in the image with the right permissions, and the start command can do any setup that has not already been accomplished as part of building the image. This simply involves creating a Dockerfile that includes the right steps, and replacing the entire RPM packaging process with a docker build command. In fact, the microservices in our system always had the ability to deploy either as RPMs or as docker images. For the microservice modules, effectively this amounted to the simplification of removing the RPM step. Note that since the Dockerfiles now form part of the build process, they must also be maintained in source control, alongside the module they represent.

The only other significant change was to the Deploy script. Here, instead of using yum or a similar RPM package manager to install the application, we simply use kubectl apply -f on a number of kubernetes specs to create the full deployment. Once again, since these kubernetes specs form part of the deployment process, they must be maintained in source control along with the modules they refer to.

An additional change that is worth noting is that in a kubernetes-based system, the idea of separate target environments does not make sense. Kubernetes makes use of what is effectively a general purpose infrastructure where any application can be deployed. Instead of having multiple target environments, we used kubernetes namespaces. The Deploy script creates a namespace in kubernetes instead of provisioning a new target environment. The same kubernetes infrastructure can be used for all stages except Local, and even all instances of the same stage, and the Deploy script will simply put them in different namespaces. This allows us to amortize the cost of the kubernetes infrastructure (including set-up and maintenance cost) across all of our stages. To make sure the Test script instances can access the right system under test, the Deploy script can create uniquely named services or ingress controllers on different URLs, and pass the corresponding URLs as targets to the corresponding Test script instances.

There are, of course, countless other refinements and tweaks that can be made, but the purpose of the posts was mainly to illustrate the concept of designing a CI/CD pipeline set to solve specific problems we were facing. I may revisit some other refinements in future posts.

Recent posts

See more

Categories

About

A brief bio