ch contain
the autoconf macros mentioned in your error message).
--
Janne Blomqvist
//cgroup is problematic on array jobs.
Thanks for trying it out! Indeed, we only recently upgraded to 18.08 and
it seems the upgrade broke it. Fixed now (or broke it if you're still on
17.11... :-) ).
I also fixed the array jobs issue while at it.
--
Janne Blomqvist
e comment field at the end of
the job.
The above is an ansible role, but if you're not using ansible you can
just pull the scripts from the "files" subdirectory.
--
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto University School of Science, PHYS & NBE
+
intended? Is it a consequence of something in my environment?
I vaguely recall reading somewhere that accounting_storage_mysql is
deprecated and that one should use slurmdbd instead. So maybe it's
possible to sidestep the problem by not trying to build that module at all?
--
Janne Blomqvist,
malize" the fairshare consumption based on
the geometric mean of a set of hopefully not too unrepresentative
single-node benchmarks [1].
We also set a memory billing weight, and have MAX_TRES among our
PriorityFlags, approximating dominant resource fairness (DRF) [2]
[1] https://github.com/Aa
RAMSpace=no
ConstrainSwapSpace=yes
AllowedSwapSpace=400
(Note that it's not possible to separately set the maximum swap usage.
If you instead limit only the memory and not mem+swap, it will limit
memory but swap usage will be unlimited.)
As for you second part of the question, no, it
isn't generally overloaded, there can
still be occasional spikes causing these kinds of issues. We used to
suffer from these errors as well, in our case it was enough to bump
somaxconn and tcp_max_syn_backlog (we use 4096 for both). See also
https://slurm.schedmd.com/high_throughput.html
not delete a dataset used by a running job.
But nothing concrete done yet. Anyway, I'm open to suggestions about
better ideas, or existing tools that already solve this problem.
--
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto University School of Science, PHYS & NBE
+358503841576 || janne.blomqv...@aalto.fi
for memory limits, you should also set
JobAcctGatherParams=NoOverMemoryKill
- If you're NOT using cgroups for memory limits, try setting
JobAcctGatherParams=UsePSS which should avoiding counting the shared mappings
multiple times.
--
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto University School of Science, PHYS & NBE
+358503841576 || janne.blomqv...@aalto.fi
g
1) to stderr, for debugging purposes when running in the foreground.
2) to syslog when running daemonized.
PS.: And if one doesn't care about non-systemd users, one can drop
option #2 and let systemd forward stderr to syslog.
PS2.: If one needs some kind of more structured and/or binary
usively on the controller node, where more frequent
connections can prevent time decay disconnections and reduce the
likelihood of cache misses.
This is probably good idea particularly if one has large parallel jobs,
otherwise the nodes could DOS the AD/LDAP servers when launching if the
ca
is
to set ConstrainKmemSpace=no is cgroup.conf (but AFAICS this option was
added in slurm 17.02 and is not present in 16.05 that you're using).
For more information, see discussion and links in slurm bug #5082.
--
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto
rticular usecase was easy to work around by modifying the
jupyterhub->slurm integration stuff to use sudo to submit the job, and
setting up an appropriate sudo rule.
--
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto University School of Science, PHYS & NBE
+358503841576 || janne.blomqv...@aalto.fi
certainly more fully featured, and if
you want to use some weird and not commonly used part of the NTP spec,
chances are that ntpd supports it and chrony doesn't. Also, if you're
running stratum-0 clocks, ntpd might have better support for such things.
--
Janne Blomqvist, D.Sc. (Tech.
14 matches
Mail list logo