Re: Building with many cores without OOM

sre4ever Mon, 09 Dec 2024 12:47:55 -0800

Hi,

Let me plop in right into this discussion with no general solution andmore things to think about. For the context I'm packaging Java things,and Java has historically been notoriously bad at guessing how muchmemory it could actually use on a given system. I'm not sure things aremuch better these days. This is just to remind that the issue is nowherenear as easy as it looks and that many attempts to generalize anapproach that works well in some cases have failed.


Le 2024-12-09 14:42, Guillem Jover a écrit :


My thinking here was also about the general case too, say a system
that has many cores relative to its available memory, where each core
would get what we'd consider not enough memory per core

This is actually a common situation on most systems but a few privilegeddevelopers configurations. This is especially true in cloud-like,VM/containerized environments where it is much easier (i.e. with lessimmediate consequences) to overcommit CPU cores than RAM. Just look atthe price list of any cloud computing provider to get an idea of theratios you could start with. And then the provider may well lie aboutthe actual availability of the cores they will readily bill you for, andyou will only notice that when your application will grind to a halt atthe worst possible time (e.g. on a Black Friday if your business is tosell stuff), but at least it won't get OOM-killed.

There are a few packages that are worrying me about how I'm going tomake them build and run their test suites on Salsa without either timingout on one side, and getting immediately OOM killed at the other end ofthe slider. One of them wants to allocate 17GiB of RAM per test worker,and wants at least 3 of them. Another (Gradle) needs approximately 4 GiBof RAM (JVM processes alone, adding OS cache + overhead to that probablymakes the total around 6-7 GiB) per additional worker for its build, andI don't know yet how much is needed for its tests suites as my currentsetup lacks the storage space necessary to run them. On my currentlow-end laptop (4 threads, 16 GiB RAM) dpkg guesses [1] are wrong , Ican only run a single worker if I want to keep an IDE and a web browserrunning on the side. Two if I close the IDE and kill all browser tabsand other memory hogs. I would expect FTBFS bug reports if arun-of-the-mill dpkg-buildpackage command failed to build the package onsuch a system.

(assuming for
example a baseline for what dpkg-deb might require, plus build helpers
and their interpreters, and what a compiler with say an empty C, C++
or similar file might need, etc).

+1 for taking a baseline into consideration, as the first worker isusually significantly more expensive than additional workers. In myexperience with Java build processes the first worker penalty is in thevicinity of +35% and can be much higher for lighter build processes (butthen they are lighter and less likely to hit a limit excepted on veryconstrained environments).

Another thing I would like to add is that the requirements may changedepending on the phase of the build, especially between building andtesting. For larger projects, building requires usually more memory butless parallelism than testing. You could always throw more workers atbuilding, but at some point additional workers will just sit mostly idleconsuming RAM and resources as there is a limited number of tasks thatthe critical path will allow at any given point. Testing, especiallywith larger test suites, usually allows for (and sometimes needs) muchmore parallelism.

Also worth noting, on some projects the time spent testing can be ordersof magnitude greater than the time spent building.

This could also imply alternatively or in addition, providing a tool
or adding some querying logic in an existing tools (in the dpkgtoolset)
to gather that information which the packaging could use, or…

Additional tooling may help a bit, but I think what would really help atthat point would be to write and publish guidelines relevant to thetechnology being packaged, based on empirical evidence collected whilefine tuning the build or packaging, and kept reasonably up-to-date (i.e.never more than 2-3 years old) with the current state of technologiesand projects. Salsa (or other CI) pipelines could be instrumented toprovide some data and once the guidelines cover a majority of packagesyou will have a better insight of what, if anything, needs to be donewith the tooling.

[1]:https://salsa.debian.org/jpd/gradle/-/blob/upgrade-to-8.11.1-wip/debian/rules#L49


--
Julien Plissonneau Duquène

Re: Building with many cores without OOM

Reply via email to