Re: [slurm-users] autoreconf fails because of undefined macros (creating a new plugin)

2020-01-10 Thread Janne Blomqvist
ch contain the autoconf macros mentioned in your error message). -- Janne Blomqvist

Re: [slurm-users] Get GPU usage from sacct?

2019-11-19 Thread Janne Blomqvist
//cgroup is problematic on array jobs. Thanks for trying it out! Indeed, we only recently upgraded to 18.08 and it seems the upgrade broke it. Fixed now (or broke it if you're still on 17.11... :-) ). I also fixed the array jobs issue while at it. -- Janne Blomqvist

Re: [slurm-users] Get GPU usage from sacct?

2019-11-15 Thread Janne Blomqvist
e comment field at the end of the job. The above is an ansible role, but if you're not using ansible you can just pull the scripts from the "files" subdirectory. -- Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist Aalto University School of Science, PHYS & NBE +

Re: [slurm-users] RPM build error - accounting_storage_mysql.so

2019-11-11 Thread Janne Blomqvist
intended?  Is it a consequence of something in my environment? I vaguely recall reading somewhere that accounting_storage_mysql is deprecated and that one should use slurmdbd instead. So maybe it's possible to sidestep the problem by not trying to build that module at all? -- Janne Blomqvist,

Re: [slurm-users] Proposal for new TRES - "Processor Performance Units"....

2019-06-19 Thread Janne Blomqvist
malize" the fairshare consumption based on the geometric mean of a set of hopefully not too unrepresentative single-node benchmarks [1]. We also set a memory billing weight, and have MAX_TRES among our PriorityFlags, approximating dominant resource fairness (DRF) [2] [1] https://github.com/Aa

Re: [slurm-users] Gentle memory limits in Slurm using cgroup?

2019-05-02 Thread Janne Blomqvist
RAMSpace=no ConstrainSwapSpace=yes AllowedSwapSpace=400 (Note that it's not possible to separately set the maximum swap usage. If you instead limit only the memory and not mem+swap, it will limit memory but swap usage will be unlimited.) As for you second part of the question, no, it&#

Re: [slurm-users] Socket Timed Out on Send/Recv Operation

2019-04-18 Thread Janne Blomqvist
isn't generally overloaded, there can still be occasional spikes causing these kinds of issues. We used to suffer from these errors as well, in our case it was enough to bump somaxconn and tcp_max_syn_backlog (we use 4096 for both). See also https://slurm.schedmd.com/high_throughput.html

Re: [slurm-users] Kinda Off-Topic: data management for Slurm clusters

2019-02-25 Thread Janne Blomqvist
not delete a dataset used by a running job. But nothing concrete done yet. Anyway, I'm open to suggestions about better ideas, or existing tools that already solve this problem. -- Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist Aalto University School of Science, PHYS & NBE +358503841576 || janne.blomqv...@aalto.fi

Re: [slurm-users] slurm, memory accounting and memory mapping

2019-01-11 Thread Janne Blomqvist
for memory limits, you should also set JobAcctGatherParams=NoOverMemoryKill - If you're NOT using cgroups for memory limits, try setting JobAcctGatherParams=UsePSS which should avoiding counting the shared mappings multiple times. -- Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist Aalto University School of Science, PHYS & NBE +358503841576 || janne.blomqv...@aalto.fi

Re: [slurm-users] An observation on SLURM's logging

2018-12-02 Thread Janne Blomqvist
g 1) to stderr, for debugging purposes when running in the foreground. 2) to syslog when running daemonized. PS.: And if one doesn't care about non-systemd users, one can drop option #2 and let systemd forward stderr to syslog. PS2.: If one needs some kind of more structured and/or binary

Re: [slurm-users] Slurm missing non primary group memberships

2018-11-20 Thread Janne Blomqvist
usively on the controller node, where more frequent connections can prevent time decay disconnections and reduce the likelihood of cache misses. This is probably good idea particularly if one has large parallel jobs, otherwise the nodes could DOS the AD/LDAP servers when launching if the ca

Re: [slurm-users] can't create memory group (cgroup)

2018-09-09 Thread Janne Blomqvist
is to set ConstrainKmemSpace=no is cgroup.conf (but AFAICS this option was added in slurm 17.02 and is not present in 16.05 that you're using). For more information, see discussion and links in slurm bug #5082. -- Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist Aalto

Re: [slurm-users] --uid , --gid option is root only now :'(

2018-05-14 Thread Janne Blomqvist
rticular usecase was easy to work around by modifying the jupyterhub->slurm integration stuff to use sudo to submit the job, and setting up an appropriate sudo rule. -- Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist Aalto University School of Science, PHYS & NBE +358503841576 || janne.blomqv...@aalto.fi

Re: [slurm-users] ntpd or chrony?

2018-01-15 Thread Janne Blomqvist
certainly more fully featured, and if you want to use some weird and not commonly used part of the NTP spec, chances are that ntpd supports it and chrony doesn't. Also, if you're running stratum-0 clocks, ntpd might have better support for such things. -- Janne Blomqvist, D.Sc. (Tech.