Records of Some Usage and Best Practices of Dockerfile

Record some usage and best practices encountered during the use of Dockerfile.

Differences between COPY and ADD Commands

Both COPY and ADD are Dockerfile instructions that can copy files or directories from the host to the Docker image. However, there are some differences between them:

  1. Function: The COPY instruction copies new files or directories from the build context and adds them to the specified path in the image’s file system. The ADD instruction has a similar function, but it has two additional features. First, if the source file is a tar file, ADD will automatically extract this tar file. Second, the ADD instruction supports using a URL as the source file and will automatically download the file pointed to by this URL.

  2. Usage Recommendation: Since the ADD instruction has more functions, its behavior is more complex and less predictable. Docker officially recommends using the ADD instruction only when you actually need the additional features it provides. Otherwise, use the COPY instruction by default. This makes the Dockerfile more understandable and maintainable.

  3. Examples:

    • Using the COPY instruction:
    1
    COPY test.txt /data/

    This copies the test.txt file in the current directory to the /data/ directory in the image.

    • Using the ADD instruction:
    1
    ADD https://example.com/test.txt /data/

    This downloads the remote test.txt file and copies it to the /data/ directory in the image.

Multi-stage Builds

Multi-stage compilation in Dockerfile is a new feature introduced after Docker version 17.05. It allows you to use multiple FROM instructions in one Dockerfile. Each FROM instruction can use a different base image and starts a new build stage. Each stage is completely independent and can be regarded as a temporary intermediate image.

There are two main advantages of multi-stage builds: one is to prevent the final production Docker image from becoming too large; the other is to avoid leaving unnecessary tools and dependencies in the image during the build process.

Here is an example of using multi-stage builds. It first uses the golang image to compile a Go application, and then in a new stage, uses a smaller Alpine-based image to run the application:

1
2
3
4
5
6
7
8
9
10
11
12
# Stage 1: Build the Go binary
FROM golang:1.14.2 as builder
WORKDIR /go/src/app
COPY . .
RUN go get -d -v ./...
RUN CGO_ENABLED=0 GOOS=linux go build -o app .

# Stage 2: Copy the Go binary to an empty Docker image
FROM alpine:latest
WORKDIR /root/
COPY --from=builder /go/src/app/app .
CMD ["./app"]

In this example, first, a build stage named builder is defined. It starts from the golang:1.14.2 image, copies the source code into the image, and then compiles the Go application.

Then, the second build stage starts. It starts from the smaller alpine:latest image and copies the compiled Go application from the builder stage to the new image.

In this way, the final image only contains the compiled Go application, without additional tools and dependencies such as the Go compiler used for compilation, making the image more lightweight.

What are the Differences between CMD and ENTRYPOINT

CMD sets the default command to be executed by the container and can have parameters. If other commands are specified when running Docker (that is, the docker run command), the CMD command will be ignored.

ENTRYPOINT configures the command to be run when the container starts, allowing the container to run as an application or service. Different from CMD, it will not be overridden by the command-line parameters of docker run.

Precisely for this reason, generally, it is recommended to use ENTRYPOINT and write all the commands to be executed into a script. This can reduce problems caused by parameter passing during the deployment process.

How to Obtain the SHA256 of a Docker Image

1
docker inspect --format='{{index .RepoDigests 0}}' <docker image> | cut -d ':' -f 2

Docker Compose

Docker Compose is a tool for defining and running multi-container Docker applications. It allows users to configure the container services, networks, volumes, and other related settings of the entire application through a YAML file (usually named docker-compose.yml). Docker Compose is an orchestration tool officially provided by Docker, mainly used to simplify the process of running multiple Docker containers on a single machine.

From my practical work experience, there are two main benefits of Docker Compose: dependency management and environment switching.
Docker Compose can manage the dependency relationships between services, ensuring that services start and stop in the correct order.
It can also write different docker-compose.yml files for different environments (such as development, testing, production) and specify to load different configuration files through the -f parameter.

Here is a case and explanation.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
version: '3.9'

services:
db:
image: postgres:13-alpine
environment:
POSTGRES_USER: myuser
POSTGRES_PASSWORD: example
volumes:
- db_data:/var/lib/postgresql/data

backend:
build: ./backend
depends_on:
- db
environment:
DB_HOST: db
DB_PORT: 5432
DB_USER: myuser
DB_PASS: example

frontend:
build: ./frontend
ports:
- "80:80"
depends_on:
- backend

volumes:
db_data:

networks:
default:
name: my_app_net

In this example:

  • There is a service named db, which is a container based on the Postgres database image.
  • The backend service depends on the db service, which means the backend service will start after the db service is started and ready. The depends_on keyword is used to express this dependency relationship.
  • The backend service needs to connect to the db service, so it sets DB_HOST to db because in the same Docker Compose network, services can access each other through the service name.
  • The frontend service also depends on the backend service. The frontend service will only start when the backend is fully started.
  • The volumes section defines a persistent data volume db_data for storing the data of the db service, ensuring that the data will not be lost when the container restarts.
  • Finally, all services are assigned to the default network my_app_net so that they can communicate with each other.