In this post, I will try to explain what it is, why I wrote it and some of the drawbacks I’ve found. I am not trying to justify its existence of it or even suggesting this is the right way of doing things. I haven’t decided whether I like it or not.
macondo is a microframework that lets
you package your scripts/applications in a Docker container, and run them as if
they were regular, native programs. It does so by:
- Automating the mounting of local files into the container for scripts that
need to interact with them (e.g. tools that work with files in the current
directory, or that need to access
- Making sure that files written/modified from the container have the right user ownership (i.e. the user id of the files will be the same as that of the host and not of the container).
- Easily distribute/share “
Developing/distributing software tools (especially for internal/private usage) poses some challenges:
Runtimes: people write tools in different languages and runtimes. In the same company, some people use bash, some like Ruby, or Python, or Java, etc. It is a pain to correctly set up the environment for all of the different tools.
The user experience also sucks: a Ruby tool not only needs the right runtime but also requires you to manually set up its dependencies (e.g. installing
Again, this is not just for Ruby. Almost every language needs a very specific runtime version and config to run properly.
Languages that compile to native binaries might be more self-contained, but they still need to be compiled for different architectures/OS.
And most importantly, having so many tools/languages available is good. You can use the language that best suits the problem at hand, or the one where you are most proficient, etc.
Reproducibility: you can’t ensure that the environment of your users is the same one you used to develop/test the application. Maybe your script depends on
jqbeing installed, or
Docker to the rescue, kind of
Docker is a tool that helps alleviate this to some extent. It lets you package and distribute applications in environments tailored to run them without issues. Everything the application depends on is already in the container and so it solves the two problems above, kind of.
One of the issues with Docker is that it has an awkward interface when it comes to run CLI programs that work with local files. You need to understand what files the container needs, where it needs them and you need to explicitly mount them, etc.
Let’s make this more clear with an example.
Say we have an imaginary tool that receives a random string as an argument and then downloads a random dog image from Dog API, converts it into black and white, and uploads it to AWS S3. For the sake of simplicity, it is a bash script:
#!/usr/bin/env bash DOG_NAME=$1 # download random dog curl -s https://dog.ceo/api/breeds/image/random | \ jq -r '.message' | \ xargs wget -O tmp.jpg # make it black and white convert tmp.jpg -colorspace gray $1.jpg rm tmp.jpg # push it to s3 aws --profile some-profile s3 cp $1.jpg s3://some-bucket/dogs/$1.jpg
Pretty simple script. Let’s see its dependencies:
awscli, and good credentials in
Say we want to distribute this and we package it so people can run it without worrying about the dependencies:
FROM alpine RUN apk add bash jq imagemagick curl py3-pip util-linux RUN pip3 install awscli icdiff COPY dog.sh /dog.sh RUN chmod a+x /dog.sh WORKDIR "/output-folder" ENTRYPOINT ["/dog.sh"]
Now we try to run it:
$ docker run -it dog foodogo The config profile (some-profile) could not be found
It failed because the
aws depends on the
~/.aws/credentials file, but this
is running from the container, not the host. So we try:
$ docker run -v ~/.aws/credentials:/root/.aws/credentials:ro dog foodogo $ ls $ # there no files :(
The dog image was downloaded and pushed to S3, but we can’t see it because it was in the container that no longer exists. So we try:
$ docker run -v ~/.aws/credentials:/root/.aws/credentials:ro -v $PWD:/output-folder dog foodogo $ ls -l -rw-r--r-- 1 root root 26692 Dec 30 12:39 foodogo.jpg
Notice how we had to know about the Dockerfile structure so we could mount the current directory into the right folder. Also notice that the image is now in our local system but it is owned by the root user. This is problematic. The application would need to know beforehand the right user/group ID and then change the ownership of the file.
You get the idea… not only it is cumbersome to run the container with the right parameters, but also it does not necessarily interact nicely with the local file system.
Now, here’s how it would work using macondo. We add the following annotations to the script:
#!/usr/bin/env bash # @from Dockerfile # @vol PWD:/output-folder # @vol ~/.aws:~/.aws:ro set -euo pipefail .... the rest is the same
We see here two @annotations that
macondo will use to build the image
the local Dockerfile, and that tells it what folders to mount and how (
With that, you can run:
$ macondo dog.sh foobar $ ls -l -rw-r--r-- 1 cristian users 49035 Dec 30 12:51 foobar.jpg
Notice that you don’t need to worry about manually mounting any directory, and that the file has the right permissions.
There’s a dry-run flag that can help demistify how this works:
$ macondo -0 dog.sh foobar docker run -i --env HOST_USER_ID=1001 --env HOST_GROUP_ID=100 --volume /home/cristian/scratch/example-macondo:/output-folder --volume /home/cristian/.aws:/home/cristian/.aws:ro dog.sh.aligned foobar
This is how it works:
macondo builds a base Docker image using the
Dockerfile but then writes an extra wrapper Docker image (here called
dog.sh.aligned) with an ephemeral user that has the same name of my user, and
the same user and group ids. That’s how it is able to generate the file with the
Most of the features are enabled by adding @annotations to the script.
@from: can be
Dockerfileto use the Dockerfile in the same directory as the command being run/build. But it can also receive a one-liner like
@from AlpinePackages jq httpie ..., which builds an Alpine Linux docker image with the provided packages installed. It also work with
@vol: can explicitly mount directories or files. It has the same syntax than Docker’s
--volumeflag. It also accepts
PWDwhich resolves to the current directory, and
~that resolves to
@needs_sshcan be true or false (defaults to false). This mounts the SSH socket file from the host so that tools that rely on the SSH daemon work (e.g.
@needs_ttyruns the Docker image with the
@align_with_host_userenabled by default, creates an ephemeral user that has the same name, user, and group ID as the user running
@enable_dynamic_volume_mountsinfers when a command is using a file in the host, even if it’s not in the current directory, and mounts it into the container (e.g.
macondo something ../../some-filewould mount that file or directory into the container so it can be accessed).
@useroverrides the user the container runs with.
@workdirspecifies the working directory of the container
@versionare used to organize, build, and publish Docker images.
macondosupports the concept of “repositories”, allowing to run commands for which the source code is not available locally. More on this later.
It’s worth noticing that other than the extra annotations, the script can still
be run directly (in the example above
Also, this works for any kind of scripts (Python, Ruby, etc.) and any kind of application. For instance, this is the “macondo file” for a JVM application written in Ballerina:
# @from Dockerfile # @version 0.1.0 # @description Takes a savepoint from the specified Flink job # @group Flink
In this case, it is the responsibility of the Dockerfile to build/package the application in the right way.
So far we have used
macondo to run a local script. But that’s not its intended
use. In my use case, I have a plethora of commands and CLI tools that I make
available to coworkers.
The idea is that they only need to install
macondo once and set up one or more
repositories that allow them to run these commands. They do not need access to
the source code or binaries of the applications because these live in a Docker
macondo repositories are YAML manifests that contain information about the
programs, their annotations, and their Docker image location. These can be
created using the
macondo build -y command:
$ mv dog.sh dog.mcd # commands you want to publish need to have the .mcd extension $ macondo build -y -p cristianc/dog . | tee repository.yaml Built dog.mcd and published to cristianc/dog:dog.mcd-0.1.0 --- commands: - name: dog group: "" .... etc.
The previous command builds our dog
macondo command (which is in the current
directory, hence the
.), publishes the Docker image to DockerHub (in a
real-world case this would be an ECR repository for instance), and writes a YAML
repository file to
This YAML file is what you will actually distribute. I usually keep it in Artifactory or a Gist, and hand it to my users (i.e. coworkers).
By the way,
macondo build will traverse the provided directory looking for
.mcd files to build, and it will publish them all. Here’s an example run of a
real-world directory with commands:
macondo build ~/src/company-cmds/ Built flink-inspect.mcd Built get-ladot-device.mcd Built ververica-gen-tfm.mcd Built flink-list.mcd Built flink-deploy.mcd Built flink-gen-tfm.mcd Built suspend-job.mcd Built flink-jenkins.mcd Built flink-savepoint.mcd Built build-flink.mcd
To add a repository to macondo use the
macondo repo command:
$ macondo repo add manifest.yaml # or http url
Another advantage of putting the manifest in a static HTTP endpoint is that you
can override it with newer versions of your commands, or more commands, and all
your users need to do to access this commands is run
macondo repo update.
All commands in the added repository will be accessible as
e.g. this shows all commands my current
macondo repositories provide:
$ macondo You forgot to provide a command... Flink commands: flink-gen-tfm Generates scaffolding terraform for a Flink job flink-savepoint Takes a savepoint from the specified Flink job flink-deploy Deploys a Flink job to any environment flink-jenkins Generates Jenkins task for a Flink job build-flink Builds Flink from source code flink-inspect Scrapes information from a job running on the provided job manager's url(s) flink-list Lists all jobs currently running on a particular Kubernetes cluster Ververica commands: suspend-job Generates Flink CI/CD terraform from a Ververica deployment ververica-gen-tfm Generates Flink CI/CD terraform from a Ververica deployment Data Team commands: get-ladot-device Gets the latest state of a vehicle in the LADOT account Personal commands: process-videos Concats all the videos in the provided folder and pushes them to Youtube
The good and the bad parts
So far my experience with this approach has been pleasant. But of course I am biased. I’ve come to use it even for personal tools. These are the bits I like:
- Can be used by anyone that has
dockerinstalled, regardless of whether they use Mac or Linux (haven’t tested this in Windows).
- You can iterate on the tools locally, and publish them by updating the
manifest. All your users need to do to access new commands or versions of them
- It makes the development of polyglot tools a breeze. As long as you can package it in Docker anyone can run it.
These are the things I dislike:
- The “fake user id” part is a bit hacky. So far I support Ubuntu and Alpine Docker base images only, and even for those two I have to do it differently.
- The Docker images might sometimes be too fat for what they package. At least
the first time a user runs a
macondocommand it will take a while because the image has to be fetched and then it has to be wrapped to fake the user id.
- It obscures the application. When you run a regular script and it does not
work you can at least inspect it and figure out. When it is run through
macondo, and wrapped in Docker, etc. it becomes harder to figure out when things do not go as planned.