MariaDB migration to Podman

10 Jun, 2020

Just some guy on the internet.

If you've been following along with my attempt to migrate this Wordpress site into container services with Podman then you will be happy to know that I've achieved the first milestone. The database for this site now resides pleasantly in a rootless Podman container.

One of the major reasons I wanted to try Podman was that, outside of installing the package itself, everything I wanted to run could be achieved as a non-root, non-privileged account. So far the non-root containers are living up to the marketing material, which is something I wish we could see more of in the tech world.

That isn't to say there were not challenges. If you read my last post you'll get a pretty good picture of some of the issues I ran into while getting my feet wet with this project. https://sudoedit.com/podman-selinux-and-systemd/.

This post is an extension of my last, so if you are trying to get started I suggest you read that post first. This post is a combination of a tutorial, and deep dive into the weird ways my brain try's to understand this stuff - I hope you stick with me through it.

How to set up the MariaDB service

Here is a quick recap of the build process:

Containerfile

    FROM registry.fedoraproject.org/fedora:32
    MAINTAINER luke@sudoedit.com
    RUN dnf -y install mariadb-server mariadb
    COPY mariadb-service-limits.conf /etc/systemd/system/mariadb.service.d/limits.conf
    RUN systemctl enable mariadb
    RUN systemctl disable systemd-update-utmp.service
    ENTRYPOINT ["/sbin/init"]
    CMD ["/sbin/init"]

This file pulls the standard Fedora 32 container image from the fedora container registry. My reasoning for doing so is outlined in my earlier post, and I encourage you to read it if you are so inclined. This container file defines my build, as a standard Fedora 32 system with MariaDB installed, a custom systemd unit file to allow more open-files for the MariaDB service, and allows systemd inside the container to manage the MariaDB service.

I realize that having systemd running in my container is controversial for some people, as it goes against the "single process" container model. I don't hold the single process per container philosophy so I'm just going to ignore it for now. I doubt there really is such a thing as a true single process container. I prefer to think of it as a single purpose container. The purpose in this case is to get a database up and running, and this accomplishes that goal.

Data Volume

Containers are (generally) ephemeral, any changes you make to a running container will disappear when the container is stopped and restarted. This can be a problem for a database since its goal is to store persistent data. The solution here is to mount some storage from the host into the container and allow the container process to write to that location.

I chose to create the following directory for my database: /srv/sudoedit/data/db

The directory tree looks like this:

    /srv
    └── sudoedit
        └── data
            └── db

Each directory in the tree needs to be owned by the user account and group that the container will run as, at least up to the point where the volume will be mounted - in this case that is the directoy "db". At the point in your data tree that you reach the place where your data will be stored, you need to make the directory owned by the "mysql" user that exists in the container. Not a mysql user that is on the host - you don't need a mysql user on the host.

For clarification, I've chosen to mount /srv/sudoedit/data/db on the container host - to /var/lib/mysql inside the container. I'll show you how to do that a little later on in they unit file that manages this service on the host. For now, what you need to know is that the last directory in that chain, db and everything underneath it, needs to be owned by the mysql user in the container. How do we do that? - Enter the dark world of user namespaces.

User Namespaces

If you want to learn a little something about user namespaces (and you really do if you want to use Podman) then read these articles from Red Hat https://www.redhat.com/sysadmin/rootless-podman-makes-sense and opensource.com https://opensource.com/article/19/2/how-does-rootless-podman-work.

For our purposes, the key takeaway from those articles is that in order to change the owner of the directory at our mount point to the "mysql" user for the container, we need to enter the container namespace and change ownership of the db directory.

We do that with the podman unshare command.

After you create your directory, tree change ownership on the directory tree to the account that you want to use to run the container, then switch to that user.

Now you want to use podman unshare to set the owner of your directory to the mysql user in the container like this:

    $ podman unshare chown 27:27 /srv/sudoedit/data/db

In my container the mysql user has the UID and GID 27 so I set the ownership using those values - note that you do not need to run this as root. You are running that command as the normal unprivileged user account that will run the container - no sudo required. In this instance you don’t need sudo or root on the host, because when you enter the user namespace that the container will run in, your user account is treated as the root account in the container. Therefore, you have permission to change the ownership of arbitrary files to anything you want.

So lets take a look at the permissions on /srv/sudoedit/data/db

    cd /srv/sudoedit/data
    ls -lZ
    total 4
    drwxr-xr-x. 5 296634 296634 system_u:object_r:container_file_t:s0:c836,c854 4096 Jun 10 01:06 db

Notice a couple things here

What's up with that UID and GID? We set {{< inlinecode >}}27:27not {{< inlinecode >}}296634:296634
Make note of the SELinux file label container_file_t any file that you want the container process to interact with needs that label.
- https://docs.fedoraproject.org/en-US/Fedora/11/html/Security-Enhanced_Linux/sect-Security-Enhanced_Linux-Working_with_SELinux-SELinux_Contexts_Labeling_Files.html
- https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/selinux_users_and_administrators_guide/sect-security-enhanced_linux-working_with_selinux-selinux_contexts_labeling_files

In your case, the UID and GID listed will likely be different than the ones I posted here. This is because your user has been given control of a number of subuid's that will be mapped into any containers that are created by that user. Read the articles by Dan Walsh that I posted earlier. Basically, the uid and gid we see from the host perspective represent the uid and gid that get mapped into the container and become 27:27 inside the container namespace.

How does the uid mapping work?

Here is the simplest way I can explain it. Cat out the contents of /etc/subuid:

    cat /etc/subuid
    user1:100000:65536
    luke:165536:65536
    app-svc-account:296608:65536

Notice the above subuid's for user1, luke, and app-svc-account are set to 100000, 165536, and 296608 respectively.

user1 on the host gets to have the UID's 100000 through 165535 mapped into their containers, luke gets the next 65,536 UID's, and so on. Each UID range encompasses 65536 UID's. This range is configurable.

The app-svc-account is the service account I designated to run the MariaDB container. That means the host OS has allowed that account to control UID 296608 and then the next 65536 UID's that follow it. Since the mysql user has uid 27 and root in the container is 0 - that makes the host UID for the mysql user in the container namespace 296634 which is 296608 + 26. This is ideal because on the host machine there are no users with UID's in that range, and there never will be - which means if a process did escape the container, it would not have access to any files owned by a real user on the host.

NOTE: If you are using some kind of central authoriztion like LDAP or Active Directory then you will need to give some serious thought to how you want to handle the subuid issue on your hosts.... I'm not going to even begin to think about it here. Not yet at least, but it could be a real problem if you have subuid’s that overlap with real uid’s for real users.

Enable the service account to start services without logging in.

Use the loginctl command to enable users who are not logged in to start processes.

    enable-linger [USER...], disable-linger [USER...]
               Enable/disable user lingering for one or more users. If enabled for a specific user, a user manager is spawned
               for the user at boot and kept around after logouts. This allows users who are not logged in to run long-running
               services.

sudo loginctl enable-linger <username>

Systemd unit file

Next, I created the following systemd unit file which allows my app-svc-account user to start the MariaDB container at startup.

I named my unit file db-sudoedit.service and placed it at /usr/lib/systemd/system/

    [Unit]
    Description=Podman container - db-sudoedit.com
    After=sshd.service

    [Service]
    Type=simple
    User=app-svc-account
    ExecStart=/usr/bin/podman run -i --read-only --rm -p 3306:3306 --name db-sudoedit.com -v /srv/sudoedit/data/db/:/var/lib/mysql:Z --tmpfs /etc --tmpfs /var/log --tmpfs /var/tmp localhost/sudoedit-db
    ExecStop=/usr/bin/podman stop -t 3 db-sudoedit.com
    ExecStopAfter=/usr/bin/podman rm -f db-sudoedit.com
    Restart=always

    [Install]
    WantedBy=multi-user.target

It's a pretty standard unit file. But I want to point out a few things.

Notice I set the service to start "After" sshd. This is because I need the login service to be started and for now, the best way to make sure that the system is ready to log users in, is to wait for the sshd service to be up.
Notice the User definition - Systemd is starting this container service as a standard non-root user.
The ExecStart definition - The Z (Capital Z) in the volume declaration indicates the following: "The Z option tells Podman to label the content with a private unshared label. Only the current container can use a private volume." - from man podman-run.

All that's left to do is start and enable the service and you should be up and running.

sudo systemctl enable db-sudoedit.service --now

If all goes well you should have your database up and running on port 3306 on your host machine.

You can switch over to your service account again and check for the running container:

    podman ps
    CONTAINER ID  IMAGE                                                 COMMAND     CREATED            STATUS                PORTS  NAMES
    194793d82d0b  localhost/sudoedit-db:latest  /sbin/init  About an hour ago  Up About an hour ago         db-sudoedit.com   3306

    podman container top -l | grep mysql
    mysql   144   1      0.046   1h12m52.971462433s   ?     2s     /usr/libexec/mysqld --basedir=/usr

What next?

Next step for me is to get a Nginx container up and running to act as a revers proxy for my apache vhosts. I'm going to break the 1 process per container rule again, and have it do both Nginx reverse proxy and have certbot for my letsencrypt SSL termination.

Once I get that up and running I'll update the blog with any tips I learned along the way. I will also include a brief discussion of privileged ports, and whether or not I chose to use a root container for Nginx or if I end up allowing non-root users to bind privileged ports.

If you read through this, and you think I missed something, or have any questions let me know.

References

If you found this useful please support the blog.

Fastmail

I use Fastmail to host my email for the blog. If you follow the link from this page you'll get a 10% discount and I'll get a little bit of break on my costs as well. It's a win win.

Backblaze

Backblaze is a cloud backup solution for Mac and Windows desktops. I use it on my home computers, and if you sign up using the link on this page you get a free month of service through backblaze, and so do I. If you're looking for a good backup solution give them a try!

Thanks!

Luke

#Containers #MySQL #Podman #mariadb