Hi Hermann, no, we don't use --propagate.
in slurm.conf, we set PropagateResourceLimits=COREThat in fact means, that we really do not propagate any limits besides the coresize (excerpt from slurm.conf manpage):
If neither PropagateResourceLimits or PropagateResourceLimitsExcept are configured and the "--propagate" option is not specified, then the default action is to propagate all limits.
So, the maximum number of processes should not be propagated from the submit nodes to the batch nodes. Moreover, I do not know where that high limit might come from.
In /etc/security/limits.conf we set * soft nproc 262144 ulimit -u gives me 16384 on the submit nodes.the batchjobs are still working as expected, but that "error"-message is somewhat disturbing.
Best Marcus Am 23.03.2023 um 10:01 schrieb Hermann Schwärzler:
Hi Marcus,I am not sure if this is helpful but from looking at the source code of Slurm (line 276 of src/slurmd/slurmstepd/ulimits.c in version 22.05) it looks like you are explicitly using"--propagate..." to set resource limits (the one you see when running "ulimit -a") on the workers the same as on the submit host.The error "Invalid argument" is returned when Slurm wants to set the hard limit lower than the (default?) soft limit (in this particular case for the maximum number of processes("ulimit -u")).Maybe your hard limit for that on the submit host is configured to be lower than it is on the worker nodes; Slurm gets this error and shows it to you as you were using the --propagate option?Regards, Hermann On 3/23/23 08:00, Wagner, Marcus wrote:Hi Folks, has anyone ever stumbled upon such an error:slurmstepd: error: Can't propagate RLIMIT_NPROC of 767202 from submit host: Invalid argumentAnyone knows, where that comes from? Any hints are welcome. Best Marcus
smime.p7s
Description: S/MIME Cryptographic Signature