Build a Docker container image
Linux containers are a way to build a self-contained environment that includes software, libraries, and other tools. They’re immensely useful and this guide describes my method for creating them.
The steps for setting-up Docker containers will be as follows:
- Install Docker and other software tools
- Create a Dockerfile
- Build, name and tag the image
- Run your container and test
- Push to DockerHub
- Save the Docker image
- Using a Docker image on HPC systems
My workflow is primarily written for users of Windows Subsystem for Linux, but should also be useful for Linux, Windows and Mac users.
Let’s get started…
Some Terminology
An aspect of Docker that often confuses people is the difference between images and containers.
A Docker image (also called an container image) is a read-only (immutable) file that contains the source code, libraries, dependencies, tools, and other files needed for an application to run inside a Docker container. An image can be created from scratch or built on top of a previously existing image, based on the instructions contained in a Dockerfile.
A Docker container is a runtime environment with all the necessary components including code, dependencies, and libraries that are needed to run the application code without using host machine dependencies. This container runtime runs on the engine on a server, machine, or cloud instance.
Docker images can be stored in a Docker registry such as Docker Hub.
Install Docker and other software tools
In order to create Docker containers (and thus, images), you’ll need to install Docker Desktop. Versions are available for Windows, Linux and Mac.
I also recommend Visual Studio Code for Windows and Windows Subsystem for Linux users.
Notes for Windows Subsystem for Linux (WSL2) users
If you are using Windows Subsystem for Linux, do not install Docker Desktop in WSL. Instead, install the Windows version which will be able to access your WSL2.
Once installed, start Docker Desktop from the Windows Start menu, then select the Docker icon from the hidden icons menu of your taskbar. Right-click the icon to display the Docker commands menu and select “Change Settings”
Ensure that “Use the WSL 2 based engine” is checked in Settings > General.
Select from your installed WSL 2 distributions which you want to enable Docker integration on by going to: Settings > Resources > WSL Integration.
To check that the installation has worked correctly, try the following command in WSL2:
docker --version
Optional: Test that your installation works correctly by running a simple built-in Docker image using: docker run hello-world
Create a Dockerfile
A Dockerfile is a plain text file with keywords that add elements to a Docker image. There are many keywords that can be used in a Dockerfile (documented on the Docker website), but I will keep it simple with the following outline:
- Starting point: Do you want to start with a pre-existing Docker image?
- Additions: What needs to be added? Folders? Data? Other software?
- Environment: What variables (if any) are set as part of the software installation?
Make the Dockerfile
In Visual Studio Code (or a coding editor of your choice), create a new file and save it as Dockerfile (no extension).
Choose a base image with FROM
You don’t need to create everything from scratch. Instead, you may want to choose a “base” image to add things to. For instance, if you’re using Python software, a good starting point might be an “official” Python image. You can search the Docker Hub to locate images.
Once you’ve decided on a base image and version, add it as the first line of your Dockerfile, like this:
FROM repository/image:tag
Some images are maintained by DockerHub itself (these are called “official” images mentioned above), and do not have a repository. As an example, if I wanted to create a container with Python 3.13, I would add this as the first line in my Dockerfile.
FROM python:3.13.0
When possible, you should use a specific tag (not the automatic latest
tag) in FROM
statements.
Install packaged software with RUN
This step can be a bit tricky. We need to add commands to the Dockerfile to install the desired software. There are a few standard ways to do this:
- Use a Linux package manager. This is usually
apt-get
for Debian-based containers (e.g, Ubuntu) oryum
for RedHat Linux containers (e.g., CentOS). - Use a software-specific package manager (like
pip
orconda
for Python). - Use installation instructions (usually a progression of
configure
,make
,make install
).
Each of these options will be prefixed by the RUN
keyword. You can join together linked commands with the &&
symbol; to break lines, put a backslash \
at the end of the line. RUN
can execute any command inside the image during construction, but keep in mind that the only thing kept in the final image is changes to the filesystem (new and modified files, directories, etc.).
For example, suppose that your job’s executable ends up running Python and needs access to the packages plantcv
, as well as the Unix tool wget
. Below is an example of a Dockerfile
that uses RUN
to install these packages using the system package manager (apt-get
) and Python’s built-in package manager (pip
).
# Build the image based on the official Python v.3.12.7 image
FROM python:3.12.7
# Our base image happens to be Debian-based, so it uses apt-get as its system package manager
RUN apt-get update && \
apt-get install -y --no-install-recommends \
python3-pip wget
# Use RUN to install Python package PlantCV via pip, Python's package manager
RUN pip install plantcv==4.5.1
One of the benefits of Docker containers is their reproducibility. Therefore, I consider it good practice to specify the version of each piece of software that goes into it when using pip
. So, I’d choose pip install plantcv==4.5.1
rather than just pip install plantcv
.
If you need to copy specific files (like source code) from your computer into the image, place the files in the same folder as the Dockerfile and use the COPY
keyword. You could also download files within the image by using the RUN
keyword and commands like wget
or git clone
.
Set-up your environment with ENV
Your software might rely on certain environment variables being set correctly.
One common situation is that if you’re installing a program to a custom location (like a home directory), you may need to add that directory to the image’s system PATH
. For example, if you installed some scripts to /home/software/bin
, you could use
ENV PATH="/home/software/bin:${PATH}"
Some useful environmental variables that I have set when creating Python-based containers are shown below:
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PIP_NO_CACHE_DIR=1 \
PIP_BREAK_SYSTEM_PACKAGES=1
So the final Dockerfile will appear as follows:
# Build the image based on the official Python version 3.12.7 image
FROM python:3.12.7
# Set environment variables
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PIP_NO_CACHE_DIR=1 \
PIP_BREAK_SYSTEM_PACKAGES=1
# Install linux packages via apt-get
RUN apt-get update && \
apt-get install wget -y --no-install-recommends
# Install Python packages via pip
RUN pip install plantcv==4.5.1
Build, name and tag the image
So far, the image has not been built. I have just compiled instructions on how to build it.
Firstly, it’s very important that you decide on a name for your image, as well as a tag. Tags are important for tracking which version of the image you’ve created (and are using). A simple tag scheme would be to use numbers (e.g. v0, v1, etc.), but you can use any system that makes sense to you. I am going to call my container plantcv
and give it the tag 4.5.1a
(after the version of PlantCV that it will run).
To build and tag your image, open a terminal and navigate to the folder that contains your Dockerfile:
$ cd path/to/directory
Then make sure Docker is running (there should be an icon on your status bar, and running docker info
shouldn’t indicate any errors) and then run:
$ docker build -t username/imagename:tag .
In my case, my Docker username is adamdimech
, so the correct command for me will be:
$ docker build -t adamdimech/plantcv:4.5.1a .
If you get errors, try to determine what you may need to add or change to your Dockerfile and then run the build command again. Debugging a Docker build is largely the same as debugging any software installation process.
Run your container and test
You should test your Docker container locally to ensure everything is working as it should. To interact with a Docker container, use the following command:
docker run -it username/image:tag /bin/bash
Replacing the variables with the particulars of your container. In my case:
docker run -it adamdimech/plantcv:4.5.1a /bin/bash
This will start a running copy of the container and start a command line shell inside. You should see your command line prompt change to something like:
root@6ed0ab0aafb4:/#
When you’re ready to leave the container, type exit
.
Push to DockerHub
Once your image has been successfully built and tested, you can push it to DockerHub. Pushing it to DockerHub means that it will be available for others to use. It also means that you can have your image installed on another machine, such as a high-performance cluster.
To push to DockerHub, use the following command:
$ docker push username/imagename:tag
If you have not previously logged-in to DockerHub, you may need to run this command beforehand:
$ docker login
Saving your Image
Unfortunately, if you have a free account on DockerHub, any container image that you have pushed there will be scheduled for removal if it is not used (pulled) at least once every 6 months (refer to the Docker Terms of Service). For this reason, it’s a good idea to save your image to a file and storing this somewhere safe. The following code will crate a tarball of your image:
$ docker save --output archive-name.tar username/imagename:tag
Using a Docker image on HPC systems
If you are intending running your container on a high-performance computing system, you may find that your system administrator will not support Docker.
Other programmes like Shifter work better on HPC systems and work seamlessly with Docker containers. For instance, to pull a Docker container, use the following command (using the previous example):
shifter pull adamdimech/plantcv:4.5.1a
Then to access the image, use the –image flag as shown below. You can also pass additional commands to it if required, for instance:
shifter --image=adamdimech/plantcv:4.5.1a plantcv-run-workflow --config "/path/to/config.json"
There are a wide variety of alternatives to Shifter for HPC systems, so check with your system admin for the particulars.
Comments
No comments have yet been submitted. Be the first!