On May 23, 2025, at 12:13, Dennis Clarke <dcla...@blastwave.org> wrote:
> On 5/23/25 15:00, Mark Millard wrote: >> Dennis Clarke <dclarke_at_blastwave.org> wrote on >> Date: Fri, 23 May 2025 17:45:17 UTC : >>> I have been watching qt6-webengine-6.8.3 fail over and over and over >>> for some days now and it takes with it a pile of other stuff. >>> >>> In the log I see this unscripted trash of a message : >>> >>> [00:05:03] FAILED: v8_context_snapshot.bin >>> [00:05:03] /usr/local/bin/python3.11 >>> ../../../../../qtwebengine-everywhere-src-6.8.3/src/3rdparty/chromium/build/gn_run_binary.p >>> y ./v8_context_snapshot_generator --output_file=v8_context_snapshot.bin >>> [00:05:03] >>> [00:05:03] >>> [00:05:03] # >>> [00:05:03] # Fatal error in , line 0 >>> [00:05:03] # Oilpan: Out of memory >>> [00:05:03] # >>> [00:05:03] # >> Way to little context so all I can do is basically form >> questions at this point. > > Sorry ... I just realized that other people replied to me OFF-LIST and > that is not helpful to others. > > So the machine titan is fairly beefy : > > titan# > titan# uname -apKU > FreeBSD titan 15.0-CURRENT FreeBSD 15.0-CURRENT #1 main-n277353-19419d36cf2a: > Mon May 19 20:40:28 UTC 2025 > root@titan:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 amd64 1500043 > 1500043 > titan# > titan# sysctl hw.model > hw.model: Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz > titan# > titan# sysctl hw.ncpu > hw.ncpu: 64 > titan# > titan# sysctl hw.physmem > hw.physmem: 549598998528 > titan# > titan# sysctl hw.freemem > sysctl: unknown oid 'hw.freemem' > titan# > titan# sysctl kstat.zfs.misc.arcstats.memory_free_bytes > kstat.zfs.misc.arcstats.memory_free_bytes: 404796436480 > titan# sysctl vm.kmem_map_free > vm.kmem_map_free: 431405096960 > titan# > > Also plenty of storage and local NVMe stuff etc etc and dual > NVidia GPU's that do nothing at all. For now. > > We ( myself and others ) have already found that the problem was > me. No big surprise. > > USE_TMPFS=yes > TMPFS_LIMIT=32 > MAX_MEMORY=32 > # MAX_FILES=1024 > MAX_EXECUTION_TIME=172800 > PARALLEL_JOBS=64 > PREPARE_PARALLEL_JOBS=64 > > That was the problem in the poudriere config. > > I commented out the MAX_MEMORY and TMPFS_LIMIT and then watched > as www/qt6-webengine built just fine. Guess the jail needed more > than 32G eh? > >> I assume that you have not explicitly restricted the memory >> space for any processes, so that RAM+SWAP is fully available >> to everything. If not, you need to report on the details. >> > > Yup .. I had restrictions in place. Those very very few packages > are hogs. Just massive running pigs for memory it seems. > > >> How much RAM? How much SWAP space? (So: how much RAM+SWAP?) >> (RAM+SWAP does not vary per process tree or per builder, >> presuming no deliberate restrictions have been placed.) > > 512G mem and 32G swap which never gets touched. So RAM+SWAP == 544 GiBytes 544 GiBytes / 64 jobs == 8.5 GiBytes/job [mean] Some builders will exceed that, for sure. But you are really managing the total across the 64 jobs (adjusting the jobs count would be a possibility). I do things like: For 32 GiBytes of RAM: RAM+SWAP == 150 GiBytes (so: RAM+SWAP == 4.6875*RAM) So for 8 hw threads (and, so, 8 jobs): 150 GiBy†es / 8 jobs == 18.75 GiBytes/job [mean] For 64 GiBytes of RAM: RAM+SWAP == 300 GiBytes (so: RAM+SWAP == 4.6875*RAM) So for 12 performance hw threads (and, so, 12 jobs): 300 GiBy†es / 12 jobs == 25 GiBytes / jobs [mean] But for 16 hw threads (and, so, 16 jobs): 300 GiBy†es / 16 jobs == 18.75 GiBytes / job [mean] For 192 GiBytes of RAM: RAM+SWAP == 704 GiBytes (so: RAM+SWAP == 3.66...*RAM, I had other reasons to constraint this context more) So for 32 hw threads (and, so, 32 jobs): 704 GiBytes / 32 jobs == 22 GiBytes / job [mean] But I also use TMPFS_BLACKLIST extensively with USE_TMPFS=all to avoid having much probability of running out of resources. (This actually uses less than All but does not eliminate tmpfs use from being involved for the blacklisted package builds.) This is for use of ALLOW_MAKE_JOBS=yes (so a high load average context much of the time). I do end up with examples of SWAP being used for paging. In a couple of those hw contexts, I, on very rare occasion, do a 'bulk -c -a' run (as a test), also with ALLOW_MAKE_JOBS=yes . (I generally do not use MAKE_JOBS_NUMBER_LIMIT or the like.) A bias is to keep the hw threads busy with already-available useful work to do without having to systematically wait for the next unit of work. (Of course, other tradeoffs are also managed.) Note: My original questions should have asked about ALLOW_MAKE_JOBS and the likes of MAKE_JOBS_NUMBER_LIMIT use as well. (Not that such matters now.) >> Do you even have "whatever it seems to want" configured >> for the RAM+SWAP? (I'm guessing that you do not know that >> the "128G" figure is in fact involved.) >> > > I commented out those restrictions. Makes me worry that some other > packages will come along and fail because they need 384G of mem or > something silly like that. I'd be more worried about multiple jobs that each use more than 8.5 GiBytes that happen to lead to the total being more than (RAM+SWAP) [544 GiBytes]. Side note: I'll note that at one point in a past 'bulk -c -a' run test, a process got a run-way error that rapidly wrote to the log file without bound. Not good if the log file is in tmpfs. But such has happened only with one vintage of the ports tree. It is the only context I've seen that would reach or pass 384 GiBytes for one builder. > I have been advised ( in the last hour ) > that chromium ports generate +40000 source files and such. That is > just abusive but the way of the future I am sure. rust with USE_TMPFS=all and not in TMPFS_BLACKLIST can use 27+ GiBytes of RAM+SWAP. And it is not the largest example around. >> How many parallel builders are active in the bulk run >> as the bulk build approaches the failure? > > I think 64 max. > >> How much RAM+SWAP in use by other builders or other things >> on the system as the system progresses to that failure (not >> after the failure)? > > .... > > *sigh* > > The problem was me. > >> ZFS (and its ARC)? UFS? If ZFS: Any tuning? >> > > No tuning. It just works(tm) and that is ZFS. Most of my use is UFS based. Only the machine with the most hw threads and RAM is ZFS based these days. (Part of that is to be a UFS test case since its use for such activities is less common: At one time, I noticed and reported that 'bulk -c -a' was at the time broken under UFS because of reaching a UFS limitation that was later adjusted.) >> Basically: all the significant sources of competing for >> RAM+SWAP? >> .... === >> Mark Millard >> marklmi at yahoo.com > > > It feels like the correct approach is just give everything to the > poudriere bulk situation and then watch for flames. > > No flames? No smoke? Great .. it is working. I did experimentation with 'bulk -c -a' to arrive at configuration choices that would survive such and left sizable margins for handling growth without regular adjustments. So my flame tests were more up front. === Mark Millard marklmi at yahoo.com