Continuous Integration Pipeline
Introduction
There are a number of tasks a developer may wish to perform such as
- Compiling source code
- Packaging binaries
- Generating documentation / diagrams
- Running the test suite
- Running checks
- Static analysis
- Coverage
Issues With Complicated Tasks
Sometimes the steps required to performing these tasks can get quite complicated. If the tasks are not easy for a developer
- The developer may not bother with certain tasks. Which means they may commit changes without running all the checking tools, or inspecting the generated documentation.
- They may invoke the tools incorrectly by passing the wrong flags or omitting files.
Benefits of Running Tasks Locally
Having all the tasks run on the server is great, however, it is desirable to be able to run every task locally too. The benefits are
- It reduces the reliance on servers, which means developers can work offline.
- It provides developers with quicker feedback because they can run each stage individually (and usually faster) without having to wait until they push to the server (or create a pull request).
- It help maintainability and the development of new stages because each stage can be tested locally first.
- It keeps pull request history clean because the author can discover and fix their warnings offline rather than updating their PR to fix warnings.
Using a Task Runner to Simplify Execution
A task runner should be used to run the tasks. This allows the developer
- To easily see the list of tasks which can be run.
- To ignore the details of how each tool is invoked.
The most ubiquitous task runner is GNU Make. Using it as an example, in the makefile there would be a target for test-coverage
which defines what needs to happen to generate the coverage report. The user simply invokes make test-coverage
to generate the report without worrying about how it is generated.
Depending on the projects main programming language and build system you may choose to use a different tool such as CMake (with add_custom_target) or Python Invoke.
Consistent Tool Invocation on CI Server
It is essential all the tasks run on a continuous integration (CI) server such as Jenkins. This ensures every pull request does not unknowingly introduce bad changes.
The task runner provides a simple interface for invoking each tool. Therefore the Jenkinsfile can be a simple wrapper that calls into the task runner in the same manner as the developer does. For example
stage('build') {
steps {
sh "make build"
With this approach the developer and server are both invoking the tools in the same manner. The definition of which is mastered in the task runner code.
Consistent Tool Versioning
Using the task runner we have ensured that each developer and the server will invoke the tools in the correct manner. However if they are not running the same versions of the tools they may see different results.
Usually every repository will contain a file describing it’s dependencies (requirements.txt, debian/control, etc). This file allows the developers and server to ensure they are using the correct version. However it can get messy when the developers or server are required to build different projects which require different versions of the same dependency. Some programming languages like Python provide solutions to this such as using pipenv.
A more generic solution is to use containers to provide an isolated working environment with the correct dependencies.
Using Docker to Ensure Consistent Tool Versioning
A Dockerfile in the repository is used to instruct Docker how to create an image with all the necessary tools. The developers and the server can invoke all tasks from within the container to ensure they all get the same result.
The benefit are
- It helps new starts because there is no lengthy installation process. Simply start the container are you are up and running.
- It makes teams more autonomous. If they want to change tooling versioning they can do so by editing their dockerfile which they have ownership of rather than having to coordinate with other teams to change the version on a shared Jenkins server.
Development dockerfiles will have these main steps
-
Start from a base image
FROM ubuntu:20.04
-
Copy the file listing the dependencies
COPY ./debian/control /tmp/my-project/debian/control # Debian example COPY ./dev-requirements.txt /tmp/my-project/dev-requirements.txt # Pip example
-
Install the dependencies from the file
RUN apt-get install --yes devscripts equivs \ && mk-build-deps --install --remove --tool='apt-get --yes' # Debian example RUN apt-get install --yes python3-pip \ && pip3 install --requirement dev-requirements.txt # Pip example
Running Docker Locally
Using the CLI
-
To build the image from the dockerfile
docker image build --tag project_name .
-
To run the container
docker run --interactive --tty --mount type=bind,src=${PWD},dst=/tmp/my-project --user $(id -u):$(id -g) project_name
- Command line: We have to pass the input from the host to the input of the container (
--interactive
) and tell the process the input is coming from a terminal (--tty
). - Shared data: A mounted drive gives the container access to to the repository and gives the host access to generated artifacts (
--mount
). - User_ID: By default the ubuntu container has root UID which means all the generated artifacts will be owned by root. Use the host’s UID in the container (
--user
).
- Command line: We have to pass the input from the host to the input of the container (
-
Invoke the task runner from within the container
I have no name!@b403fd3a236c:/$ invoke build
Note:
I have no name!
is because there is no matching UID in the /etc/passwd file.
Using VSCode
- Open the project in VSCode
- F1 -> Dev Containers: Rebuild and Reopen folder in Container
- Terminal -> New Terminal (which will now be in container)
- Invoke the task runner from within the container
Running Docker on CI Server
Customize the Jenkins execution environment by instructing Jenkins to use the dockerfile checked into the repository to build the container.
pipeline {
agent {
dockerfile true
There is no need to explicitly specify any args because Jenkins will
- Add a shared volume.
- Set the working directory.
- Set the UID & GID to the same as the host.
Only Checking Modified Files
Most checking tools (flake8, mypy, etc) can be run over a directory and they will check every relevant file. Sometimes instead of checking every file you may only want to check files that have been modified. This is useful when
- Working on a legacy system where you want to introduce new rules going forward.
- The checks take a long time.
Using git a list of modified files can be determined
git diff -G'.' --diff-filter=rd --find-renames=100% --name-only --commits {TARGET-BRANCH}...{SOURCE-BRANCH}
-G"."
: ignores files with only permission changes--diff-filter=rd
: ignores files that have been renamed or deleted--find-renames=100%
: ensures files classified as renamed have the exact same contents--name-only
: only print file names
An example of a task (defined in python) that checks flake8 on modified files would look like this
def flake8(commits="master...HEAD"):
files = get_files(commits)
python_files = get_files_by_types(files, ["Python"])
if python_files:
subprocess.run(f"flake8 --count {python_files}", shell=True)
Summary
- Developers often have several tasks to perform in a repository.
- The tasks should be easy to run locally.
- A task runner provides an interface for developers and build servers to invoke tasks in the same manner.
- Containers allow
- Developers and servers to share the same environment to reduce inconsistences.
- New starts to get up and running quickly.
- Dependency installations to be isolated from the host OS.
- Git can be used to only check modified files.
Example
A minimal HelloWorld example that implements these ideas can be found on github.
Comments