On 5/7/24 15:32, Henderson, Brent via slurm-users wrote:
Over the past few days I grabbed some time on the nodes and ran for a few hours.  Looks like I **can** still hit the issue with cgroups disabled. Incident rate was 8 out of >11k jobs so dropped an order of magnitude or so.  Guessing that exonerates cgroups as the cause, but possibly just a good way to tickle the real issue.  Over the next few days, I’ll try to roll everything back to RHEL 8.9 and see how that goes.

My 2 cents: RHEL/AlmaLinux/RockyLinux 9.4 is out now, maybe it's worth a try to update to 9.4?

/Ole

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to