Dockerfile: what is behind the format

Docker open-source software has established itself as the standard for container virtualization. Container virtualization is the next step in the evolution of virtual machines, but with one significant difference. Instead of simulating a complete operating system, a single application is virtualized in a container. Today, Docker containers are used in all phases of the software lifecycle, such as development, testing, and operations.

There are various concepts in the Docker ecosystem. Knowing and understanding these components is essential to working with Docker effectively. In particular, these include Docker images, Docker containers, and Dockerfiles. We will explain some background information and give practical tips for use.

What is a Dockerfile?

A Dockerfile is the building block in the Docker ecosystem. It describes the steps for creating a Docker image. The flow of information follows this central model: Dockerfile > Docker image > Docker container.

A Docker container has a limited lifetime and interacts with its environment. Think of a container as a living organism, such as single-celled organisms like yeast cells. Following this analogy, a Docker image is roughly equivalent to genetic information. All the containers created from a single image are the same, just like how all single-celled organisms are cloned from the same genetic information. So, how do Dockerfiles fit into this model?

A Dockerfile defines the steps for creating a new image. You must understand that everything always starts with an existing base image. The newly created image succeeds the base image. There are also a number of specific changes. To get back to our yeast cell example, the changes correspond to mutations. A Dockerfile specifies two things for a new Docker image:

  1. The base image from which the new image is derived. This anchors the new image in the Docker ecosystem family tree.
  2. A number of specific changes that distinguish the new image from the base image.

How does a Dockerfile work and how is an image created from it?

Basically, a Dockerfile is just a normal text file. The Dockerfile contains a set of instructions, each on a separate line. The instructions are executed one after the other to create a Docker image. You may be familiar with this idea from running a batch processing script. During execution, more layers are added to the image step by step. We explain exactly how this works in our article on Docker images.

A Docker image is created by executing the instructions in a Dockerfile. This step is called the build process and is started by executing the “docker build” command. The “build context” is a central concept. This defines which files and directories the build process has access to. Here, a local directory serves as the source. The contents of the source directory are passed to the Docker daemon when “docker build” is called. The instructions in the Dockerfile get access to the files and directories in the build context.

Sometimes you don't want to include all files present in the local source directory in the build context. You can use the .dockerignore file for this. This is used to exclude files and directories from the build context. The name is borrowed from Git's .gitignore file. The leading period in the file name indicates that it is a hidden file.

How is a Dockerfile structured?

A Dockerfile is a plain text file named “Dockerfile”. Please note that the first letter must be capitalized. The file contains one entry per line. Here is the general structure of a Dockerfile:

# Comment
INSTRUCTION arguments

In addition to comments, Dockerfiles contain instructions and arguments. They describe the structure of the image.

Comments and parser directives

Comments contain information primarily intended for humans. For example, comments in a Dockerfile start with a hash sign (#) in Python, Perl and Ruby. Comment lines are removed during the build process before further processing. Please note that only lines that begin with a hash sign are recognized as comment lines.

Here is a valid comment:

# Our base image
FROM busybox

In contrast, there is an error below because the hash sign is not at the beginning of the line:

FROM busybox # our base image

Parser directives are a special kind of comment. They are located in comment lines and must be at the beginning of the Dockerfile. Otherwise, they will be treated as comments and removed during the build. It is also important to note that a given parser directive can only be used once in a Dockerfile.

At the time of writing, only two types of parser directives exist: “syntax” and “escape”. The “escape” parser directive defines the escape symbol to be used. This is used to write instructions over several lines, as well as to express special characters. The “syntax” parser directive specifies the rules the parser must use to process Dockerfile instructions. Here is an example:

# syntax=docker/Dockerfile:1
# escape=\

Instructions, arguments and variables

Instructions make up most of the Dockerfile’s content. Instructions describe the specific structure of a Docker image and are executed one after the other. Like commands on the command line, instructions take arguments. Some instructions are directly comparable to specific command line commands. So, there is a COPY instruction which copies files and directories and is roughly equivalent to the cp command on the command line. However, a difference from the command line is that some Dockerfile instructions have specific rules for their sequence. Furthermore, certain instructions can appear only once in a Dockerfile.

Note

Instructions do not have to be capitalized. You should still follow the convention when creating a Dockerfile though.

For arguments, you must make a distinction between hard-coded and variable parts. Docker follows the “twelve-factor app” methodology and uses environment variables to configure containers. The ENV instruction is used to define environment variables in a Dockerfile. Now, let’s take a look at how to assign a value to the environment variable.

The values stored in environment variables can be read and used as variable parts of arguments. A special syntax is used for this purpose. It is reminiscent of shell scripts. The name of the environment variable is preceded by a dollar sign: $env_var. There is also an alternative notation for explicitly delimiting the variable name in which it is embedded in curly brackets: ${env_var}. Let's look at a concrete example:

# set variable 'user' to value 'admin'
ENV user="admin"
# set username to 'admin_user'
USER ${user}_user

The most important Dockerfile instructions

We will now present the most important Dockerfile instructions. Traditionally, some instructions – especially FROM – were only allowed to appear once per Dockerfile. However, there now are multi-stage builds. They describe multiple images in a Dockerfile. The restriction then applies to each individual build stage.

Instruction Description Comment
FROM Set base image Must appear as the first instruction; only one entry per build stage
ENV Set environment variables for build process and container runtime
ARG Declare command line parameters for build process May appear before the FROM instruction
WORKDIR Change current directory
USER Change user and group membership
COPY Copy files and directories to the image Creates new layer
ADD Copy files and directories to the image Creates new layer; use is discouraged
RUN Execute command in image during build process Creates new layer
CMD Set default arguments for container start Only one entry per build stage
ENTRYPOINT Set default command for container start Only one entry per build stage
EXPOSE Define port assignments for running container Ports must be exposed when starting the container
VOLUME Include directory in the image as a volume when starting the container in the host system

FROM instruction

The FROM instruction sets the base image on which subsequent instructions operate. This instruction may only exist once per build stage and must appear as the first instruction. There is one caveat: the ARG instruction may appear before the FROM instruction. You can thus specify exactly which image is used as the base image via a command line argument when starting the build process.

Every Docker image must be based on a base image. In other words, each Docker image has exactly one parent image. This results in a classic chicken-or-the-egg dilemma. The lineage must begin somewhere. In the Docker universe, lineage begins with the “scratch” image. This minimal image serves as the origin of any Docker image.

ENV and ARG instructions

These two instructions assign a value to a variable. The distinction between the two instructions is primarily where the values come from and the context in which the variables are available. Let's look at the ARG instruction first.

The ARG instruction declares a variable in the Dockerfile that is only available during the build process. The value of a variable declared with ARG is passed as a command line argument when the build process is started. Here is an example in which we are declaring the “user” build variable:

ARG user

When we start the build process, we pass the actual value of the variable:

docker build --build-arg user=admin

When declaring the variable, you can choose to specify a default value. If a suitable argument is not passed when starting the build process, the variable is given the default value:

ARG user=tester

Without using “--build-arg”, the “user” variable contains the “tester” default value:

docker build

Here we are defining an environment variable using the ENV instruction. Unlike the ARG instruction, a variable defined with ENV exists both during the build process and during container runtime. The ENV instruction can be written in two ways.

  1. Recommended notation:
ENV version="1.0"

2. Alternative notation for backward compatibility:

ENV version 1.0
Tip

The ENV instruction works roughly the same as the “export” command on the command line.

WORKDIR and USER instructions

The WORKDIR instruction is used to change directories during the build process, as well as when starting the container. Calling WORKDIR applies to all subsequent instructions. During the build process, the RUN, COPY and ADD instructions are affected. During the container runtime, this applies to the CMD and ENTRYPOINT instructions.

Tip

The WORKDIR instruction is roughly equivalent to the cd command on the command line.

The USER instruction is used to change the current (Linux) user, like how the WORKDIR instruction is used to change the directory. You can also choose to define the user’s group membership. Calling USER applies to all subsequent instructions. During the build process, the RUN instructions are affected by user and group membership. During the container runtime, this applies to the CMD and ENTRYPOINT instructions.

Tip

The USER instruction is roughly equivalent to the su command on the command line.

COPY and ADD instructions

The COPY and ADD instructions are both used to add files and directories to the Docker image. Both instructions create a new layer which is stacked on top of the existing image. The source for the COPY instruction is always the build context. In the following example, we are copying a readme file from the “doc” subdirectory in the build context to the image’s top-level “app” directory:

COPY ./doc/readme.md /app/
Tip

The COPY instruction is roughly equivalent to the cp command on the command line.

The ADD instruction behaves nearly identically, but it can retrieve URL resources outside the build context and unpacks compressed files. In practice, this may lead to unexpected side effects. Therefore, the use of the ADD instruction is expressly discouraged. You should only use the COPY instruction in most cases.

RUN instruction

The RUN instruction is one of the most common Dockerfile instructions. When we use the RUN instruction, we instruct Docker to execute a command line command during the build process. The resulting changes are stacked on top of the existing image as a new layer. The RUN instruction can be written in two ways:

  1. “Shell” notation: The arguments passed to RUN are executed in the image’s default shell. Special symbols and environment variables are replaced following the shell rules. Here is an example of a call that greets the current user using a subshell "$()":
RUN echo "Hello $(whoami)"

2. “Exec” notation: Instead of passing a command to the shell, an executable file is called directly. Additional arguments may be passed in the process. Here is an example of a call that invokes the “npm” dev tool and instructs it to run the “build” script:

CMD ["npm", "run", " build"]
Note

In principle, the RUN instruction can be used to replace some of the other Docker instructions. For example, the “RUN cd src” call is basically equivalent to “WORKDIR src”. However, this approach creates Dockerfiles, which become harder to read and manage as the size grows. You should therefore use specialized instructions whenever possible.

CMD and ENTRYPOINT instructions

The RUN instruction executes a command during the build process, creating a new layer in the Docker image. In contrast, the CMD and ENTRYPOINT instructions execute a command when the container is started. There is a subtle difference between the two instructions.

  • ENTRYPOINT is used to create a container that always performs the same action when started. So, the container behaves like an executable file.
  • CMD is used to create a container that executes a defined action on startup without any further parameters. The preset action can be easily overridden by suitable parameters.

What both instructions have in common is that they may only appear once in a Dockerfile. However, you can combine these instructions. In this case, ENTRYPOINT defines the default action to be performed when the container is started, while CMD defines easily overridden parameters for the action.

Our Dockerfile entry:

ENTRYPOINT ["echo", "Hello"]
CMD ["World"]

The corresponding commands on the command line:

# Output "Hello World"
docker run my_image
# Output "Hello Moon"
docker run my_image Moon

EXPOSE instruction

Docker containers communicate over the network. Services running in the container are addressed via specified ports. The EXPOSE instruction documents port assignments and supports TCP and UDP protocols. When a container is started with “docker run -P”, the container listens on the ports defined by EXPOSE. Alternatively, the assigned ports can be overwritten with “docker run -p”.

Here is an example. Our Dockerfile contains the following EXPOSE instructions:

EXPOSE 80/tcp
EXPOSE 80/udp

The following ways are then available to activate the ports when the container is started:

# Container listens for TCP/UDP traffic on port 80
docker run -P
# Container listens for TCP traffic on port 81
docker run -p 81:81/tcp

VOLUME instruction

A Dockerfile defines a Docker image which consists of layers stacked on top of each other. The layers are read-only so that the same state is always guaranteed when a container is started. We need a mechanism to exchange data between the running container and the host system. The VOLUME instruction defines a “mount point” within the container.

Consider the following Dockerfile excerpt. We create a “shared” directory in the image’s top-level directory and then specify that this directory is to be mounted in the host system when the container is started:

RUN mkdir /shared
VOLUME /shared

Note that we cannot specify the actual path on the host system within the Dockerfile. By default, directories defined by the VOLUME instruction are mounted on the host system under “/var/lib/docker/volumes/”.

How do you edit a Dockerfile?

Remember that a Dockerfile is a (plain) text file. It can be edited using the usual methods. A plain text editor is probably the most popular. This can be an editor with a graphical user interface. There is no shortage of options here. The most popular editors include VSCode, Sublime Text, Atom and Notepad++. Alternatively, a number of editors are available on the command line. In addition to the original Vim and Vi editors, the simplified editors Pico and Nano are widely used.

Note

You should only edit a plain text file with editors suitable for this purpose. Under no circumstances should you use a Word processor, such as Microsoft Word, Apple Pages, LibreOffice or OpenOffice, to edit a Dockerfile.

Was this article helpful?
We use cookies on our website to provide you with the best possible user experience. By continuing to use our website or services, you agree to their use. More Information.
Page top