Parallelizing and running distributed builds with distcc

Parallelizing the compilation of a large codebase is a breeze with distcc, which allows you to spread the load across multiple nodes and speed up the compilation time.

Here’s a sample network topology for a distributed build:

Install distcc on the three Debian/Ubuntu-based nodes:

# apt install distcc

Edit /etc/default/distcc and set:

STARTDISTCC="true" 
 
# Customize for your environment 
ALLOWEDNETS="192.168.2.0/24" 
 
# Specify your network device 
LISTENER="192.168.2.146"

Additionally, the JOBS and NICE variables can be tweaked to suit the compute power that you have available.

Start distcc:

# systemctl start distcc

Do the same all the nodes, and if you have a firewall enabled with ufw, you will need to open up the port 3632 to the master node.

# ufw allow 3632/tcp

Additionally, if you’d like to use ssh over untrusted networks so code and communication with the worker nodes happen over a secure channel, ensure that SSH is running and is opened up to the master node in the same manner as above with the key of the master node in ~/.ssh/authorized_keys of the worker nodes. Opening port 3632 in this manner is a security hole, so take precautions over untrusted networks.

Back in the master node, setup a DISTCC_HOSTS environment variable that lists the worker nodes, including the master node. Note the order of the hosts, as it is important. The first host will be more heavily used, and distcc has no way of knowing the capacity and capability of the hosts, so specify the most powerful host first.

export DISTCC_HOSTS='localhost 192.168.2.107 192.168.2.91'

At this point, you’re ready to compile.

Go to your codebase, in this case we use the Linux kernel source code for the purpose of example.

$ make tinyconfig 
$ time make -j$(nproc) CC=distcc bzImage

On another terminal, you can monitor the status of the distributed compilation with distmoncc-text or tools such as top or bpytop.

Network throughput and latency will be a big factor in how much distcc will help speed up your build process. Using ssh may additionally introduce overhead, so play with the variables to see how much distcc can help speed up or optimize the build for your specific scenario. You may want to additionally consider ccache to speed up the build process.

There are some aspects of the build process that are not effectively parallizable in this manner, such as the final linking step of the executable, for which you will not see any performance improvement with distcc.

Give distcc a spin, and put any spare compute you have lying around in your home lab to good use.

Anuradha Weeraman

Updated on Mar 24, 2024

Distributed Systems

DeepSeek-R1, at the cusp of an open revolution

DeepSeek R1, the new entrant to the Large Language Model wars has created quite a splash over the last few weeks. Its entrance into a space dominated by the Big Corps, while pursuing asymmetric and novel strategies has been a refreshing eye-opener. GPT AI improvement was starting to show signs

Artificial Intelligence

Windows of Opportunity: Microsoft's Open Source Renaissance post image

By Anuradha Weeraman

Jul 13, 2024

Windows of Opportunity: Microsoft's Open Source Renaissance

Twenty years ago, it was easy to dislike Microsoft. It was the quintessential evil MegaCorp that was quick to squash competition, often ruthlessly, but in some cases slowly through a more insidious process of embracing, extending, and exterminating anything that got in the way. This was the signature personality of

Software Freedom Linux Unix