On Thursday, 20 June 2024 16:29:11 BST Peter Humphrey wrote: > On Thursday, 20 June 2024 14:40:12 BST Michael wrote: > > On Thursday, 20 June 2024 14:27:18 BST Jack wrote: > > > On 6/20/24 8:46 AM, Peter Humphrey wrote: > > > While building a new KDE system (see my post a few minutes ago), I'm > > > > > > > finding the system stalling because it can't handle all its install > > > > jobs. I have this set: > > > > > > > > $ grep '\-j' /etc/portage/make.conf > > > > EMERGE_DEFAULT_OPTS="--jobs --load-average=30 [...]" > > > > > > I don't know how much it would matter, but are you missing a number > > > after --jobs? > > > > Without a number of jobs specified in make.conf emerge will not limit the > > number of packages it tries to build, except it will not start new jobs > > while there are at least --load-average=30 running already. > > > > > > MAKEOPTS="-j16 -l16" > > We went through all this at great length not long ago (months, perhaps: a > certain A. McK had returned to the list for a while). /usr/bin/make will > stop spawning make jobs once either (a) the number it's running reaches > -j16 or (b) the load average of those reaches -l16. Portage sending more > tasks to /usr/bin/make simply fills the latter's input queue.
Quite. Make will queue up anything above ~16 jobs, but emerge runs more than just make jobs. More and more emerge processes will kick off, up to ~30. Each emerge process will eventually launch make jobs, only for these to join a pile up in an ever congested make queue, unable to proceed further. At some point memory allocation and reallocation of queues appears to have become gnarly. Perhaps something in portage's python code leads to a race condition? I don't know if a combination of the queuing up of all these parent-child instructions and their parallelism can create an unchecked race condition, perhaps you reached some memory allocation limit, or indeed a bug in the code. Just loose suppositions of mine, not evidence by detailed debugging, let alone knowledge of python. > > > > The CPU has 24 threads and 64GB RAM, and lots of swap space, and those > > > > values have worked well for some time. Now, though, I'm going to have > > > > to > > > > limit the --jobs or the --load-average. > > > > > > > > On interrupting one such hang, I found that 32 install jobs had been > > > > waiting to run; is this limit hard coded? > > It's certainly a suspicious number. Apologies if I'm being dense here - why is it a suspicious number? I see a -- load-average of ~30 emerge instigated 'make install' jobs being queued up, while some previous 16 x make jobs are currently being processed. > > I take it the --load-average is what it says, an average, so it will jump > > above the specified number if you have not limited the --jobs number. > > See above re. input queue. > > > > > I also saw "too many jobs" or something, and "could not read job > > > > counter". > > > > > > > > Is it now bug-report time? > > > > You could set up a swap file, to avoid OOM situations, while you're > > tweaking the --jobs & --load-average. > > The existing 64GiB swap partition is rarely touched, if ever. I've never > seen an OOM error. I haven't touched jobs or loads for many months until > today, nor have I seen a failure to read a job counter. I don't know if counters are stored in memory, with running/completed/failed counts, or on disk. I can't think either DDR4, or an NVMe, would clog up their I/O channels, but you clearly witnessed a failure. Could this be a hardware glitch? You'll soon know if it shows up as a repeatable problem. > Anyway, it still rankles that I can't use more than half the machine's power > because of limits in portage. This can't be the only 64GiB machine in > gentoo- land, surely. I use 64G with no swap and MAKEOPTS="-j25 -l24.8" I haven't as yet come across a failure like yours, but I rarely try to run more than one emerge process at a time on this system. It's fast enough for my limited needs without having to increase the number of emerges at a time. On another PC which I often use as a binhost with 32G RAM, when I start two separate emerge processes manually with MAKEOPTS="-j10 -l9.8" I see swap being used a bit some times, especially when the PC user is hammering the browser for hours with many tabs open. Anyway, the MAKEOPTS directives control resource usage without hiccups. Does it make much of a difference in time saved running parallel emerges to require the addition of EMERGE_DEFAULT_OPTS?
signature.asc
Description: This is a digitally signed message part.