15 October 2023

Action Resourcing in Bazel 6

To achieve fast builds Bazel attempts to maximise concurrency in builds, however not all tasks are equal. Some may require completely saturate the CPU cores they are running on while others may be memory intensive.

Running several tasks that put pressure on the same set of resources together can reduce reliability (e.g. tests timing out) and depending on the exact workloads involved may even slow down the build overall (e.g. too many concurrent CPU-bound processes leading the OS to perform context switching more frequently, which loses some time vs. allowing existing active processes to run to completion or idle).

In Bazel, most tasks of interest are represented as actions. Actions are declared in rules (e.g. ctx.actions.run(...)) and behind the scenes tests are run as actions with the mnemonic TestRunner. We can add metadata to these actions that describe their resource requirements, which Bazel will use to make scheduling decisions.

APIs may change over time, this post covers Bazel 6.

Defining the Resource Pools

The most common resource types CPU and RAM are automatically set by Bazel, but can be set explicitly.

By default Bazel will use all available CPU cores (HOST_CPUS) and 67% of RAM (HOST_RAM*.67).

Want to try and keep a CPU core free?

bazel build --local_cpu_resources=HOST_CPUS-1 ...

Double it? Keep in mind systems with simultaneous multi-threading may appear to have double their actual core count, in which cause this quadruples it.

bazel build --local_cpu_resources=HOST_CPUS*2 ...

In theory you could also force sequential execution with;

bazel build --local_cpu_resources=1 ...

RAM can be more tricky. Where most systems can reasonably be expected to have the CPU idling before a build, RAM is often used as a cache for slower persistent storage. As it fills up the OS may offload less frequently used portions into a page file, and Bazel itself is liable to take up a chunk for its own purposes.

The total RAM also matters. 67% of 8GB could be greater than unallocated whereas 67% of 256GB leaves a lot on the table (84GB worth).

A possible solution (on GNU Linux) would be to base the RAM pool on the currently free RAM, minus what Bazel itself will likely need. Note this is not a complete solution, on systems with low or no memory Bazel would be given invalid input (a negative value).

free_mb=$(( $(sed -E '/^(MemTotal|MemFree|Cached|Buffers): *([0-9]*).*/{s//\2/;H;};$!d;x;s/[[:cntrl:]]//;s__/1024-_g;s_$_/1024_' /proc/meminfo)))
ram_pool=$(($free_mb - 300))
bazel build --local_ram_resources="$ram_pool" ...

Custom Resource Types

CPU and RAM only cover the most common scenarios. As projects grow, more specialised resources may be needed.

Rendering an image or video with dedicated hardware? You might want to define pools for the memory, shader units, etc. For the example here we'll just provide a pool for the total GPUs connected.

bazel build --local_extra_resources=gpu:2 ...

Perhaps you've actions which need vast amounts of temporary storage (several gigabytes). It could be raw uncompressed video that is generated faster than it can be compressed, generated test data, etc.

bazel build --local_extra_resources=storage:4500 ...
                                          # ^ 4,500 GB

Specifying Resource Requirements

First up, for trivial actions the default resource requirements Bazel sets may be enough.

For rule actions;
- RAM: 250MB
- CPU: 1
For tests;
- CPU: 1
- RAM: (uses size attribute)
  - small: 20MB
  - medium (default): 100MB
  - large: 300MB
  - enormous: 800MB

Not enough? Keep reading.

Execution Requirements

Resource requirements for actions are set via the execution_requirements attribute. Sadly RAM cannot currently be set this way.

def _my_rule_impl(ctx):
    # ...
    ctx.actions.run(
        # ...
        execution_requirements = {
            # Requires 4 CPU cores
            "cpu:4": "",
            # Callback to the "storage" custom resource type.
            # Requires 1,500GB of storage
            "resources:storage:1500": "",
        },
    )

my_rule = rule(
    implementation = _my_rule_impl,
    # ...
)

Via Tags (experimental)

Execution requirements can be indirectly specified via tags in conjunction with the --experimental_allow_tags_propagation flag. When the flag is used, tags will be propagated into execution_requirements in the form "<tag-name>": "".

For example;

my_rule(
    # ...
    tags = [
        "cpu:4",
    ],
)

Is equivalent to;

def _my_rule_impl(ctx):
    # ...
    ctx.actions.run(
        # ...
        execution_requirements = {
            "cpu:4": "",
        },
    )

my_rule = rule(
    implementation = _my_rule_impl,
    # ...
)

The key difference being this is specified per target, as opposed to per rule.

Via Mnemonic

With the flag --modify_execution_info it is possible to modify an actions execution_requirements (called "execution info" here) by targeting the mnemonic (an optional action identifier that may not be unique).

Some examples of builtin action mnemonics include;

Genrule which is used in the genrule rule.
TestRunner which is used for all tests.
CppCompile which is used in the builtin C/C++ rules.
Javac which is used in the builtin Java rules.

Despite this being a dedicated API, it has the same key-only limitation as the latter strategy. It also quickly becomes cumbersome to use as the flag is not allowed to be repeated.

bazel build --modify_execution_info=Genrule=+cpu:4,TestRunner=+cpu:6 ...

The flag allows keys to be added (+) and removed (-). It also accepts regex to match multiple mnemonics.

bazel build --modify_execution_info='.*=+cpu:2' ...

Resource Set (experimental)

This experimental API for actions uses the resource_set API. Unlike execution_requirements this does support RAM, although it does not handle custom resources.

To use this, you'll need to run Bazel with the flag --experimental_action_resource_set.

def _resource_estimator(os, inputs_size):
    return {
        "memory": 25.15 * inputs_size,
        "cpu": 2,
    }

def _my_rule_impl(ctx):
    # ...
    ctx.actions.run(
        # ...
        resource_set = _resource_estimator,
    )

my_rule = rule(
    implementation = _my_rule_impl,
    # ...
)

Capping Concurrency

Bazel has many other flags which influence concurrency. The aforementioned resource-oriented APIs represent a better abstraction for most use cases, but these are worth knowing about.

--jobs
Specifies the maximum number of concurrent actions. Bazel calculates a default based on host resources (e.g. CPU core count).
Depending on how your build utilises system resources (e.g. little CPU load), the default may slow down builds.
--local_test_jobs
The maximum number of local tests to run concurrently. Defaults to 0 which defers to standard action concurrency and scheduling caps.
Setting this to a value above --jobs has no effect.
--worker_max_instances
The maximum number of persistent worker processes of a given type, or all which have not been explicitly set. Bazel calculates a default based on host resources.
--worker_max_multiplex_instances
The maximum number of work requests a persistent worker with multiplexing will be given at a time. Like --worker_max_instances this can be set for a given type, or all which have not been explicitly set. Bazel will calculate a default based on host resources.