If you’ve been following along with my attempt to migrate this Wordpress site into container services with Podman then you will be happy to know that I’ve achieved the first milestone. The database for this site now resides pleasantly in a rootless Podman container.
One of the major reasons I wanted to try Podman was that, outside of installing the package itself, everything I wanted to run could be achieved as a non-root, non-privileged account. So far the non-root containers are living up to the marketing material, which is something I wish we could see more of in the tech world.
That isn’t to say there were not challenges. If you read my last post you’ll get a pretty good picture of some of the issues I ran into while getting my feet wet with this project. https://sudoedit.com/podman-selinux-and-systemd/.
This post is an extension of my last, so if you are trying to get started I suggest you read that post first. This post is a combination of a tutorial, and deep dive into the weird ways my brain try’s to understand this stuff - I hope you stick with me through it.
Here is a quick recap of the build process:
FROM registry.fedoraproject.org/fedora:32 MAINTAINER [email protected] RUN dnf -y install mariadb-server mariadb COPY mariadb-service-limits.conf /etc/systemd/system/mariadb.service.d/limits.conf RUN systemctl enable mariadb RUN systemctl disable systemd-update-utmp.service ENTRYPOINT ["/sbin/init"] CMD ["/sbin/init"]
This file pulls the standard Fedora 32 container image from the fedora container registry. My reasoning for doing so is outlined in my earlier post, and I encourage you to read it if you are so inclined. This container file defines my build, as a standard Fedora 32 system with MariaDB installed, a custom systemd unit file to allow more open-files for the MariaDB service, and allows systemd inside the container to manage the MariaDB service.
I realize that having systemd running in my container is controversial for some people, as it goes against the “single process” container model. I don’t hold the single process per container philosophy so I’m just going to ignore it for now. I doubt there really is such a thing as a true single process container. I prefer to think of it as a single purpose container. The purpose in this case is to get a database up and running, and this accomplishes that goal.
Containers are (generally) ephemeral, any changes you make to a running container will disappear when the container is stopped and restarted. This can be a problem for a database since its goal is to store persistent data. The solution here is to mount some storage from the host into the container and allow the container process to write to that location.
I chose to create the following directory for my database:
The directory tree looks like this:
/srv └── sudoedit └── data └── db
Each directory in the tree needs to be owned by the user account and group that the container will run as, at least up to the point where the volume will be mounted - in this case that is the directoy “db”. At the point in your data tree that you reach the place where your data will be stored, you need to make the directory owned by the “mysql” user that exists in the container. Not a mysql user that is on the host - you don’t need a mysql user on the host.
For clarification, I’ve chosen to mount
/srv/sudoedit/data/db on the container host - to
/var/lib/mysql inside the container. I’ll show you how to do that a little later on in they unit file that manages this service on the host. For now, what you need to know is that the last directory in that chain,
db and everything underneath it, needs to be owned by the mysql user in the container. How do we do that? - Enter the dark world of user namespaces.
If you want to learn a little something about user namespaces (and you really do if you want to use Podman) then read these articles from Red Hat https://www.redhat.com/sysadmin/rootless-podman-makes-sense and opensource.com https://opensource.com/article/19/2/how-does-rootless-podman-work.
For our purposes, the key takeaway from those articles is that in order to change the owner of the directory at our mount point to the “mysql” user for the container, we need to enter the container namespace and change ownership of the db directory.
We do that with the
podman unshare command.
After you create your directory, tree change ownership on the directory tree to the account that you want to use to run the container, then switch to that user.
Now you want to use
podman unshare to set the owner of your directory to the mysql user in the container like this:
$ podman unshare chown 27:27 /srv/sudoedit/data/db
In my container the mysql user has the UID and GID 27 so I set the ownership using those values - note that you do not need to run this as root. You are running that command as the normal unprivileged user account that will run the container - no sudo required. In this instance you don’t need sudo or root on the host, because when you enter the user namespace that the container will run in, your user account is treated as the root account in the container. Therefore, you have permission to change the ownership of arbitrary files to anything you want.
So lets take a look at the permissions on
cd /srv/sudoedit/data ls -lZ total 4 drwxr-xr-x. 5 296634 296634 system_u:object_r:container_file_t:s0:c836,c854 4096 Jun 10 01:06 db
Notice a couple things here
In your case, the UID and GID listed will likely be different than the ones I posted here. This is because your user has been given control of a number of subuid’s that will be mapped into any containers that are created by that user. Read the articles by Dan Walsh that I posted earlier. Basically, the uid and gid we see from the host perspective represent the uid and gid that get mapped into the container and become 27:27 inside the container namespace.
Here is the simplest way I can explain it. Cat out the contents of
cat /etc/subuid user1:100000:65536 luke:165536:65536 app-svc-account:296608:65536
Notice the above subuid’s for user1, luke, and app-svc-account - 100000, 165536, and 296608 respectively.
user1 on the host gets to have the UID’s 100000 through 165535 mapped into their containers, luke gets the next 65,536 UID’s, and so on. Each UID range encompasses 65536 UID’s. This range is configurable.
The app-svc-account is the service account I designated to run the MariaDB container. That means the host OS has allowed that account to control UID 296608 and then the next 65536 UID’s that follow it. Since the mysql user has uid 27 and root in the container is 0 - that makes the host UID for the mysql user in the container namespace “296634” which is 296608 + 26. This is ideal because on the host machine there are no users with UID’s in that range, and there never will be - which means if a process did escape the container, it would not have access to any files owned by a real user on the host.
NOTE: If you are using some kind of central authoriztion like LDAP or Active Directory then you will need to give some serious thought to how you want to handle the subuid issue on your hosts…. I’m not going to even begin to think about it here. Not yet at least, but it could be a real problem if you have subuid’s that overlap with real uid’s for real users.
Use the loginctl command to enable users who are not logged in to start processes.
enable-linger [USER...], disable-linger [USER...] Enable/disable user lingering for one or more users. If enabled for a specific user, a user manager is spawned for the user at boot and kept around after logouts. This allows users who are not logged in to run long-running services. sudo loginctl enable-linger <username>
Next, I created the following systemd unit file which allows my app-svc-account user to start the MariaDB container at startup.
I named my unit file db-sudoedit.service and placed it at /usr/lib/systemd/system/
[Unit] Description=Podman container - db-sudoedit.com After=sshd.service [Service] Type=simple User=app-svc-account ExecStart=/usr/bin/podman run -i --read-only --rm -p 3306:3306 --name db-sudoedit.com -v /srv/sudoedit/data/db/:/var/lib/mysql:Z --tmpfs /etc --tmpfs /var/log --tmpfs /var/tmp localhost/sudoedit-db ExecStop=/usr/bin/podman stop -t 3 db-sudoedit.com ExecStopAfter=/usr/bin/podman rm -f db-sudoedit.com Restart=always [Install] WantedBy=multi-user.target
It’s a pretty standard unit file. But I want to point out a few things.
All that’s left to do is start and enable the service and you should be up and running.
sudo systemctl enable db-sudoedit.service --now
If all goes well you should have your database up and running on port 3306 on your host machine.
You can switch over to your service account again and check for the running container:
podman ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 194793d82d0b localhost/sudoedit-db:latest /sbin/init About an hour ago Up About an hour ago db-sudoedit.com 3306 podman container top -l | grep mysql mysql 144 1 0.046 1h12m52.971462433s ? 2s /usr/libexec/mysqld --basedir=/usr
Next step for me is to get a Nginx container up and running to act as a revers proxy for my apache vhosts. I’m going to break the 1 process per container rule again, and have it do both Nginx reverse proxy and have certbot for my letsencrypt SSL termination.
Once I get that up and running I’ll update the blog with any tips I learned along the way. I will also include a brief discussion of privileged ports, and whether or not I chose to use a root container for Nginx or if I end up allowing non-root users to bind privileged ports.
If you read through this, and you think I missed something, or have any questions let me know.
If you’d like to get in touch, contact with me via email - or follow on Twitter.