🦫 Beaver

https://github.com/tillahoffmann/beaver/actions/workflows/main.yml/badge.svg https://readthedocs.org/projects/beaver/badge/?version=latest https://img.shields.io/pypi/v/beaver-build

Beaver is a minimal build system geared towards scientific programming and reproducibility. It uses the python programming language to express how transforms generate outputs from inputs. If you’re familiar with python, using Beaver couldn’t be easier, as we will demonstrate by example.

# A simple example (saved as `beaver.py`) to generate `output.txt` with content `hello`.
import beaver_build as bb

transform = bb.Shell(outputs="output.txt", inputs=None, cmd="echo hello > output.txt")

Executing Beaver from the command line generates the desired output.

$ beaver output.txt
🦫 INFO: 🟡 artifacts [output.txt] are stale; schedule transform
🦫 INFO: ⚙️ execute shell command `echo hello > output.txt`
🦫 INFO: ✅ generated artifacts [output.txt]
$ cat output.txt
hello

This seems like a convoluted way to write hello to output.txt. So what’s going on? The statement bb.Shell(...) defines a Transform that generates the Artifact output.txt by executing the shell command echo hello > output.txt. Executing beaver output.txt asks Beaver to generate the artifact–which it gladly does.

Why should we care? Transforms can be chained by using the outputs of one as the inputs for another. Beaver ensures that all transforms are executed in the correct order and parallelizes steps where possible. These are of course the tasks of any build system, but Beaver’s unique selling points are (see Why not use …? for further details):

  • users do not need to learn a domain-specific language but use flexible python syntax to create and chain transforms.

  • new transforms can be implemented easily by inheriting from Transform and implementing the apply method.

  • scheduling of operations is delegated to python’s asyncio package which both minimizes the potential for bugs (compared with a custom implementation) and simplifies parallelization.

  • defining artifacts and transforms as python objects allows for extensive introspection, such as visualization of the induced directed acyclic bipartite graph.

Other features include:

  • incremental builds based on artifact digests–whether artifacts are files or not.

Why not use …?

  • ant uses relatively verbose XML syntax and limited in its flexibility, e.g. transforms cannot be easily generated on the fly.

  • bazel focuses on speed and correctness–which it does extremely well. Bazel achieves these goals by “[taking] some power out of the hands of engineers”. This is a good compromise for production systems, but, for scientific applications, we want to retain a high degree of flexibility.

  • make is a trusted build tool, but Makefiles can quickly become complex and modularizing is difficult.

  • maven is primarily Java focused and relies on conventions to generate artifacts. Well-established conventions are essential for software development, especially in large teams, but are often lacking in the context of investigating a new scientific problem.

  • pydoit uses standard python syntax to collect task metadata akin to test discovery in pytest. However, dodo.py files are sometimes difficult to read because the code does not directly express the tasks to execute.

  • snakemake uses a non-standard python syntax, steepening the learning curve.