Published on 2020-09-21
Every project has a Continuous Integration (CI) pipeline and every one of them complains its CI is too slow. It is more important than you might think; this can be the root cause of many problems, including lackluster productivity, low morale, high barrier of entry for newcomers, and overall suboptimal quality.
But this need not be. I have compiled here a lengthy list of various ways you can simplify your CI and make it faster, based on my experience on open-source projects and my work experience. I sure wish you will find something in here worth your time.
And finally, I hope you will realize this endeavour is not unlike optimizing a program: it requires some time and dedication but you will get tremendous results. Also, almost incidentally, it will be more secure and easier to audit.
Lastly, remember to measure and profile your changes. If a change has made no improvements, it should be reverted.
This article assumes you are running a POSIX system. Windows developers, this is not the article you are looking for.
Almost certainly, your CI pipeline has to download 'something', be it a base docker image, a virtual machine image, some packages, maybe a few company wide scripts. The thing is, you are downloading those every time it runs, 24/7, every day of the year. Even a small size reduction can yield big speed ups. Remember, the network is usually the bottleneck.
In no particular order:
git clone my-repo.git --depth 1 --branch shiny-feature
, instead of cloning the whole repository every time, along with every branch and that one class file that your coworker accidentally committed once.curl
and wget
are equivalent, given the right command line options. Settle on using only one and stick to it. All my pipelines use: curl --sSL --retry 5
. You can customize further, but that's the gist of it. Other examples: make
and ninja
, gcc
and clang
, etc.grep
and awk
, no need for ripgrep
. Prefer sh
over bash
for simple scripts, make
over rake
for builds, etc. It's most likely faster, more stable, and more documented, too.X11
, man pages, etc.man
, apropos
, info
, etc. Alpine Linux gets it right by splitting almost all packages between the package itself and its documentation. E.g.: cmake
and cmake-doc
.cmake
and cmake-bash-completion
.build-base
on Alpine is a meta-package gathering make
, file
, gcc
, etc. It will bring lots of things you do not need. Cherry-pick only what you really required and steer clear of those packages.RUN rm archive.tar
, since it simply creates a new layer without removing the file from the previous layer. Prefer: RUN curl -sSL --retry 5 foo.com/archive.tar && tar -xf archive.tar && rm archive.tar
which will not add the tar archive to the Docker image.libsdl2-dev
and libsdl2-2.0
. The former is the development variant which you only need when building code that needs the headers and the libraries of the SDL2, while the latter is only useful with software needing the dynamic libraries at runtime. The development packages are usually bigger in size. You can astutely use multi-stage Docker builds to have first a build stage using the development packages, and then a final stage which only has the non-development packages. In CI, you almost never need both variants installed at the same time.apt-get install foo
will install much more than foo
. It will also install recommended packages that most of the time are completely unrelated. Always use apt-get install --no-install-recommends foo
.CGO_ENABLED=0 go build ...
because it is (at the time of writing) enabled by default. The Gradle build system also has the annoying habit to run stuff behind your back. Use gradle foo -x baz
to run foo
and not baz
.gradle
is the culprit here. If you are storing your git submodules in a submodules/
directory for example, you can run only your project tests with: gradle test -x submodules:test
.gradle
gets out of his way to clutter your filesystem with those. Of debatable usefulness locally, they are downright wasteful in CI. And it takes some precious time, too! Disable it with:
tasks.withType<Test> {
useJUnitPlatform()
reports.html.isEnabled = false
reports.junitXml.isEnabled = false
}
/etc/apk/repositories
. For example, in the main Alpine Docker image, the repository https://<mirror-server>/alpine/edge/testing
is not enabled. More information here. Other example: on OpenBSD or FreeBSD, you can opt-in to use the current
branch to get the newest and latest changes, and along them the newest dependencies.kubectl
which is a Go static binary instead of installing lots of Kubernetes packages, if you simply need to talk to a Kubernetes cluster. Naturally, the same goes for single file, dependency-less script: shell, awk, python, lua, perl, and ruby, assuming the interpreter is the right one. But this case is rarer and you might as well vendor the script at this point..gitignore
is the mainstream one, but were you aware Docker has the mechanism in the form of a .dockerignore
file? My advice: whitelist the files you need, e.g.:
**/*
!**/*.js
This can have a huge impact on performance since Docker will copy all the files inside the Docker context directory inside the container (or virtual machine on macOS) and it can be a lot. You don't want to copy build artifacts, images, and so on each time which your image does not need.docker build . -f - < Dockerfile
.RUN apk update && apk add curl
. But did you know it is not always required? You can simply do: RUN apk --no-cache add curl
when you know the package exists and you can bypass the cache.-q
flag which reduces their verbosity. Most of their output is likely to be useless, some CI systems will struggle storing big pipeline logs, and you might be bottlenecked on stdout! Also, it will simplify troubleshooting your build if it is not swamped in thousands of unrelated logs.sed
to quickly edit big files in place. E.g.: you want to insert a line at the top of a Javascript file to skip linter warnings. Instead of doing:
printf '/* eslint-disable */\n\n' | cat - foo.js > foo_tmp && mv foo_tmp foo.js
which involves reading the whole file, copying it, and renaming it, we can do:
sed -i '1s#^#/* eslint-disable */ #' foo.js
which is simpler.parallel
or make -j
.make
and gradle
. Make sure you are always using a CI instance with multiple cores and are passing --parallel
to Gradle and -j$(nproc)
to make. In rare instances you might have to tweak the exact level of parallelism to your particular task for maximum performance. Also, parallel
is great for parallelizing tasks.gradle build --offline
.ash
and dash
which are said to be much faster than bash
. For awk
there is gawk
and mawk
. For Lua there is LuaJIT
.jar
, and not using native dependencies, orshasum
.curl -k
)Most of the above rules can be automated with a script, assuming the definition of a CI pipeline is in a text format (e.g. Gitlab CI). I would suggest starting here, and teaching developers about these simple tips than really make a difference.
I would also suggest considering adding strict firewall rules inside CI pipelines, and making sure the setup/teardown of CI runners is very fast. Additionally, I would do everything to avoid a situation where no CI runner is available, preventing developers from working and deploying.
Finally, I would recommend leading by example with the pipelines for the tools made by DevOps Engineers in your organization.
I wish you well on your journey towards a fast, reliable and simple CI pipeline.
I noticed in my numerous projects with different tech stacks that some are friendlier than others towards CI pipelines than others (I am looking at you, Gradle!). If you have the luxury of choosing your technical stack, do consider how it will play out with your pipeline. I believe this is a much more important factor than discussing whether $LANG has semicolons or not because I am convinced it can completely decide the outcome of your project.
If you enjoy what you're reading, you want to support me, and can afford it: Support me. That allows me to write more cool articles!
This blog is open-source! If you find a problem, please open a Github issue. The content of this blog as well as the code snippets are under the BSD-3 License which I also usually use for all my personal projects. It's basically free for every use but you have to mention me as the original author.