I found this
article and
it reminded me of a tool I wrote a couple of months ago called
macondo
.
In this post, I will try to explain what it is, why I wrote it and some of the drawbacks I’ve found. I am not trying to justify its existence of it or even suggesting this is the right way of doing things. I haven’t decided whether I like it or not.
TL;DR
macondo
is a microframework that lets
you package your scripts/applications in a Docker container, and run them as if
they were regular, native programs. It does so by:
- Automating the mounting of local files into the container for scripts that
need to interact with them (e.g. tools that work with files in the current
directory, or that need to access
~/.aws/credentials
, etc.) - Making sure that files written/modified from the container have the right user ownership (i.e. the user id of the files will be the same as that of the host and not of the container).
- Easily distribute/share “
macondo
commands”.
The problem
Developing/distributing software tools (especially for internal/private usage) poses some challenges:
-
Runtimes: people write tools in different languages and runtimes. In the same company, some people use bash, some like Ruby, or Python, or Java, etc. It is a pain to correctly set up the environment for all of the different tools.
The user experience also sucks: a Ruby tool not only needs the right runtime but also requires you to manually set up its dependencies (e.g. installing
bundler
then runningbundle install
).Again, this is not just for Ruby. Almost every language needs a very specific runtime version and config to run properly.
Languages that compile to native binaries might be more self-contained, but they still need to be compiled for different architectures/OS.
And most importantly, having so many tools/languages available is good. You can use the language that best suits the problem at hand, or the one where you are most proficient, etc.
-
Reproducibility: you can’t ensure that the environment of your users is the same one you used to develop/test the application. Maybe your script depends on
jq
being installed, orterraform
orawscliv1
but notawscliv2
.
Docker to the rescue, kind of
Docker is a tool that helps alleviate this to some extent. It lets you package and distribute applications in environments tailored to run them without issues. Everything the application depends on is already in the container and so it solves the two problems above, kind of.
One of the issues with Docker is that it has an awkward interface when it comes to run CLI programs that work with local files. You need to understand what files the container needs, where it needs them and you need to explicitly mount them, etc.
Let’s make this more clear with an example.
Example
Say we have an imaginary tool that receives a random string as an argument and then downloads a random dog image from Dog API, converts it into black and white, and uploads it to AWS S3. For the sake of simplicity, it is a bash script:
#!/usr/bin/env bash
DOG_NAME=$1
# download random dog
curl -s https://dog.ceo/api/breeds/image/random | \
jq -r '.message' | \
xargs wget -O tmp.jpg
# make it black and white
convert tmp.jpg -colorspace gray $1.jpg
rm tmp.jpg
# push it to s3
aws --profile some-profile s3 cp $1.jpg s3://some-bucket/dogs/$1.jpg
Pretty simple script. Let’s see its dependencies: bash
, curl
, jq
, xargs
,
ImageMagick
, awscli
, and good credentials in ~/.aws/credentials
.
Say we want to distribute this and we package it so people can run it without worrying about the dependencies:
FROM alpine
RUN apk add bash jq imagemagick curl py3-pip util-linux
RUN pip3 install awscli icdiff
COPY dog.sh /dog.sh
RUN chmod a+x /dog.sh
WORKDIR "/output-folder"
ENTRYPOINT ["/dog.sh"]
Now we try to run it:
$ docker run -it dog foodogo
The config profile (some-profile) could not be found
It failed because the aws
depends on the ~/.aws/credentials
file, but this
is running from the container, not the host. So we try:
$ docker run -v ~/.aws/credentials:/root/.aws/credentials:ro dog foodogo
$ ls
$ # there no files :(
The dog image was downloaded and pushed to S3, but we can’t see it because it was in the container that no longer exists. So we try:
$ docker run -v ~/.aws/credentials:/root/.aws/credentials:ro -v $PWD:/output-folder dog foodogo
$ ls -l
-rw-r--r-- 1 root root 26692 Dec 30 12:39 foodogo.jpg
Notice how we had to know about the Dockerfile structure so we could mount the current directory into the right folder. Also notice that the image is now in our local system but it is owned by the root user. This is problematic. The application would need to know beforehand the right user/group ID and then change the ownership of the file.
You get the idea… not only it is cumbersome to run the container with the right parameters, but also it does not necessarily interact nicely with the local file system.
Enter macondo
Now, here’s how it would work using macondo. We add the following annotations to the script:
#!/usr/bin/env bash
# @from Dockerfile
# @vol PWD:/output-folder
# @vol ~/.aws:~/.aws:ro
set -euo pipefail
.... the rest is the same
We see here two @annotations that macondo
will use to build the image @from
the local Dockerfile, and that tells it what folders to mount and how (@vol
).
With that, you can run:
$ macondo dog.sh foobar
$ ls -l
-rw-r--r-- 1 cristian users 49035 Dec 30 12:51 foobar.jpg
Notice that you don’t need to worry about manually mounting any directory, and that the file has the right permissions.
There’s a dry-run flag that can help demistify how this works:
$ macondo -0 dog.sh foobar
docker run -i --env HOST_USER_ID=1001 --env HOST_GROUP_ID=100
--volume /home/cristian/scratch/example-macondo:/output-folder
--volume /home/cristian/.aws:/home/cristian/.aws:ro
dog.sh.aligned foobar
This is how it works: macondo
builds a base Docker image using the
Dockerfile
but then writes an extra wrapper Docker image (here called
dog.sh.aligned
) with an ephemeral user that has the same name of my user, and
the same user and group ids. That’s how it is able to generate the file with the
right ownership.
Available annotations
Most of the features are enabled by adding @annotations to the script.
@from
: can beDockerfile
to use the Dockerfile in the same directory as the command being run/build. But it can also receive a one-liner like@from AlpinePackages jq httpie ...
, which builds an Alpine Linux docker image with the provided packages installed. It also work withUbuntuPackages
.@vol
: can explicitly mount directories or files. It has the same syntax than Docker’s--volume
flag. It also acceptsPWD
which resolves to the current directory, and~
that resolves to$HOME
. (e.g.@vol ~/foo:/bar
)@needs_ssh
can be true or false (defaults to false). This mounts the SSH socket file from the host so that tools that rely on the SSH daemon work (e.g.git clone
).@needs_tty
runs the Docker image with the--tty
flag@align_with_host_user
enabled by default, creates an ephemeral user that has the same name, user, and group ID as the user runningmacondo
.@enable_dynamic_volume_mounts
infers when a command is using a file in the host, even if it’s not in the current directory, and mounts it into the container (e.g.macondo something ../../some-file
would mount that file or directory into the container so it can be accessed).@user
overrides the user the container runs with.@workdir
specifies the working directory of the container@group
,@description
,@name
,@version
are used to organize, build, and publish Docker images.macondo
supports the concept of “repositories”, allowing to run commands for which the source code is not available locally. More on this later.
It’s worth noticing that other than the extra annotations, the script can still
be run directly (in the example above ./dog.sh
).
Also, this works for any kind of scripts (Python, Ruby, etc.) and any kind of application. For instance, this is the “macondo file” for a JVM application written in Ballerina:
# @from Dockerfile
# @version 0.1.0
# @description Takes a savepoint from the specified Flink job
# @group Flink
In this case, it is the responsibility of the Dockerfile to build/package the application in the right way.
Publishing commands
So far we have used macondo
to run a local script. But that’s not its intended
use. In my use case, I have a plethora of commands and CLI tools that I make
available to coworkers.
The idea is that they only need to install macondo
once and set up one or more
repositories that allow them to run these commands. They do not need access to
the source code or binaries of the applications because these live in a Docker
registry.
macondo
repositories are YAML manifests that contain information about the
programs, their annotations, and their Docker image location. These can be
created using the macondo build -y
command:
$ mv dog.sh dog.mcd # commands you want to publish need to have the .mcd extension
$ macondo build -y -p cristianc/dog . | tee repository.yaml
Built dog.mcd and published to cristianc/dog:dog.mcd-0.1.0
---
commands:
- name: dog
group: ""
.... etc.
The previous command builds our dog macondo
command (which is in the current
directory, hence the .
), publishes the Docker image to DockerHub (in a
real-world case this would be an ECR repository for instance), and writes a YAML
repository file to repository.yaml
.
This YAML file is what you will actually distribute. I usually keep it in Artifactory or a Gist, and hand it to my users (i.e. coworkers).
By the way, macondo build
will traverse the provided directory looking for
.mcd
files to build, and it will publish them all. Here’s an example run of a
real-world directory with commands:
macondo build ~/src/company-cmds/
Built flink-inspect.mcd
Built get-ladot-device.mcd
Built ververica-gen-tfm.mcd
Built flink-list.mcd
Built flink-deploy.mcd
Built flink-gen-tfm.mcd
Built suspend-job.mcd
Built flink-jenkins.mcd
Built flink-savepoint.mcd
Built build-flink.mcd
To add a repository to macondo use the macondo repo
command:
$ macondo repo add manifest.yaml # or http url
Another advantage of putting the manifest in a static HTTP endpoint is that you
can override it with newer versions of your commands, or more commands, and all
your users need to do to access this commands is run macondo repo update
.
All commands in the added repository will be accessible as macondo
subcommands.
e.g. this shows all commands my current macondo
repositories provide:
$ macondo
You forgot to provide a command...
Flink commands:
flink-gen-tfm Generates scaffolding terraform for a Flink job
flink-savepoint Takes a savepoint from the specified Flink job
flink-deploy Deploys a Flink job to any environment
flink-jenkins Generates Jenkins task for a Flink job
build-flink Builds Flink from source code
flink-inspect Scrapes information from a job running on the provided job manager's url(s)
flink-list Lists all jobs currently running on a particular Kubernetes cluster
Ververica commands:
suspend-job Generates Flink CI/CD terraform from a Ververica deployment
ververica-gen-tfm Generates Flink CI/CD terraform from a Ververica deployment
Data Team commands:
get-ladot-device Gets the latest state of a vehicle in the LADOT account
Personal commands:
process-videos Concats all the videos in the provided folder and pushes them to Youtube
The good and the bad parts
So far my experience with this approach has been pleasant. But of course I am biased. I’ve come to use it even for personal tools. These are the bits I like:
- Can be used by anyone that has
docker
installed, regardless of whether they use Mac or Linux (haven’t tested this in Windows). - You can iterate on the tools locally, and publish them by updating the
manifest. All your users need to do to access new commands or versions of them
is run
macondo update
. - It makes the development of polyglot tools a breeze. As long as you can package it in Docker anyone can run it.
These are the things I dislike:
- The “fake user id” part is a bit hacky. So far I support Ubuntu and Alpine Docker base images only, and even for those two I have to do it differently.
- The Docker images might sometimes be too fat for what they package. At least
the first time a user runs a
macondo
command it will take a while because the image has to be fetched and then it has to be wrapped to fake the user id. - It obscures the application. When you run a regular script and it does not
work you can at least inspect it and figure out. When it is run through
something like
macondo
, and wrapped in Docker, etc. it becomes harder to figure out when things do not go as planned.