Hello, when trying to run a guix build agent in a docker container on openshift with a colleague and assigning 8 of the 128 cores of the physical machine, the agent would be completely choked since it would start all builds with commands such as "make -j 128". The 128 are determined by a call to the guile function current-processor-count, which calls nproc from coreutils (see "man nproc"). This works on bare metal and virtual machines, but not in containers or more generally when cgroups are used to limit the number of cores. Additionally, but less crucially, this probably leads to the max-1min-load-average parameter of guix-build-coordinator-agent-configuration to be completely useless: In the example, the machine could have a load of 120 on the other cores, but the part attached to the build agent would be idle.
This can be worked around by passing by hand extra arguments, such as "--cores=8" to the guix daemon service, and adapting max-parallel-builds of the build agent service. Still, it would be nice to have a more automated approach (for instance, when changing the number of assigned cores in openshift, one does not want to recreate a docker container with new manual parameters). Here is how far we got concerning a potential solution. When cgroups are available, the file /sys/fs/cgroup/cpu.pressure contains some measure of load congestion: some avg10=8.28 avg60=5.50 avg300=2.11 total=365519361 full avg10=0.00 avg60=0.00 avg300=0.00 total=0 Its contents are described here: https://www.kernel.org/doc/html/latest/accounting/psi.html#psi The "full" line is meaningless. I am not exactly sure what is measured by the "some" line - it is not the load, but a percentage of time during which "some tasks are stalled on a given resource". It looks like the max-1min-load-average of the build agent service could be replaced by a threshold for the avg60 value of this file. To obtain the current value, the libcgroup library, which is already available in guix, can be used; we may need to write guile bindings. I suppose that the number of available cores can be determined in a similar manner. What do you think? Andreas