Docker Best Practices

Introduction

After using docker for last several years I’d like to share best practices that works in production.

Reduce container image size

In Cloud Native world infrastructure is disposable and immutable. As result, if your kubernetes pod is rescheduled to another node, new node need to pull docker image.

Small docker images provide the following benefits:

Smaller attack surface. If image contain only your app binaries and direct dependencies without full blown OS, you will need to apply patches to fix vulnerabilities infrequently.
Faster application startup. Your container runtime will download image faster.
Less network utilization. You will reduce your network bandwidth utilization.
Less cost. Smaller images take less space. In modern cloud days you pay for the storage, using less space saves you money.

How?

There are several techniques to reduce image size:

Use distroless base images. “Distroless” images contain only your application and its runtime dependencies. They do not contain package managers, shells or any other programs you would expect to find in a standard Linux distribution.
If you need basic shell, try to use busybox or try using alpine – this is minimalistic linux distribution. One caveat is that you might need to pass extra flags for app during compilation to make it compatible with alpine.
Use multi-stage builds. This will allow you to build image in one container and the copy resulting binaries in a final image that doesn’t have compiler tools, source code and other data. Using multistage builds allows you to not use all commands in a single RUN stanza in Dockerfile, which improves source code readability

Security is important

Principle of least privilege should be used as much as possible. Within its cgroup docker container runs as root. If there is a new kernel vulnerability, malicious container might try to use it and escape to the host using root permissions.

How?

Set USER in the Dockerfile.
Do not use latest in FROM. If you use latest, you will pull latest base image. Downsides of it are the following:
1. If upstream repository is compromised, you might get compromised image with latest
2. If upstream repository bumps version, you might get incompatible version of the software in your image. Dependency updates should be manageable and not happen ad hoc.
Use digest/@sha256 in FROM to specify exact version of the container you’re pulling. Digest is shown on tag page on docker hub or you can get it after running docker pull:

 $ docker pull alpine:3.12.0
 3.12.0: Pulling from library/alpine
 df20fa9351a1: Pull complete
 Digest: sha256:185518070891758909c9f839cf4ca393ee977ac378609f700f60a771a2dfe321
 Status: Downloaded newer image for alpine:3.12.0
 docker.io/library/alpine:3.12.0

Dockerfile will looks like this:

FROM alpine@sha256:185518070891758909c9f839cf4ca393ee977ac378609f700f60a771a2dfe321
COPY ...
# And so on.

Use own docker registry. Rebuild all required base images yourself and use them. This will allow you to control which base image are being used.
Prohibit running docker containers from docker.io in production. If you run in kubernetes, use Open Policy Agent Gatekeeper or similar solution. docker.io contains a lot of images that are build both by well-known companies and by random people, not all of them have good intentions.
Do not store any sensitive information in the image. No passwords, no cloud credentials. Pass them either via mounted volume or via ENV variables.

Improve maintainability

Add LABEL with information about image maintainer and other information that is relevant for your organization.
Use ARG to pass base IMAGE. This will allow you to configure base image outside and give you ability to manage base image at scale if you have large organization/have hundreds of different images.

Oleg Atamanenko

thoughts about programming

Docker Best Practices

Introduction

Reduce container image size

How?

Security is important

How?

Improve maintainability