Re: [slurm-users] 17.11+auks+cgroups: finished jobs hang in completing state

2018-03-26 Thread Christopher Samuel
On 26/03/18 20:50, Robbert Eggermont wrote: The suggest fix (use sigkill instead of sigterm in slurm_spank_auks to stop auks) seems to work (so far). Excellent, so glad to hear that! All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] GrpTRES

2018-03-26 Thread Christopher Samuel
On 26/03/18 23:34, Mahmood Naderan wrote: Isn't there any comment? Did I follow the same procedure as you in order to set a limit for accounts? Sorry, was busy with work yesterday. Yes you did, here's what our script does for each project. ${SACCTMGR} -i modify account set GrpTRES=cpu=2500,gr

Re: [slurm-users] GrpTRES

2018-03-26 Thread Christopher Samuel
On 25/03/18 15:18, Mahmood Naderan wrote: Same as before Hmm, could you do "sacct -j 13" to see what account the job ran under? I can see you're in the "root" account too, which has no limits. cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] Slurm submit host - container

2018-03-26 Thread rodger
On 26/03/2018 18:04, Jagga Soorma wrote: slurm_load_partitions: Zero Bytes were transmitted or received I've seen this when different versions of slurmd where running on a node and on the controller. It sometimes helps to run slurmd in the foreground. Something like: slurmd -D -vvv Regards

[slurm-users] Slurm submit host - container

2018-03-26 Thread Jagga Soorma
Hello, We are currently setting up a workflow in a container that needs to submit jobs to our slurm cluster. We have slurm configured and built in the container and the appropriate slurm config and munged config files mapped into the container. Munged seems to be working fine but, when we try to

Re: [slurm-users] GrpTRES

2018-03-26 Thread Mahmood Naderan
Hi, Isn't there any comment? Did I follow the same procedure as you in order to set a limit for accounts? Regards, Mahmood On Sun, Mar 25, 2018 at 8:48 AM, Mahmood Naderan wrote: > Same as before > > # sacctmgr modify account local set GrpTRES=cpu=1,mem=1000M > Modified account associations.

Re: [slurm-users] 17.11+auks+cgroups: finished jobs hang in completing state

2018-03-26 Thread Robbert Eggermont
FYI: I think we've run into this issue: https://github.com/hautreux/auks/issues/24 It seems to be triggered by a change in signal blocking in slurmstepd: https://github.com/SchedMD/slurm/commit/d2c83807097605f10f0b19cf2c5cb5c2c6f35ad6 The suggest fix (use sigkill instead of sigterm in slurm_s

Re: [slurm-users] 17.11+auks+cgroups: finished jobs hang in completing state

2018-03-26 Thread Robbert Eggermont
Hi Chris, On 26-03-18 05:04, Christopher Samuel wrote: Does the slurmd log report it trying to kill the auks process? The first thing I need to do is turn up the logging verbosity. https://bugs.schedmd.com/show_bug.cgi?id=4733 The fact that auks is hanging around makes me wonder if this i