Re: [slurm-users] Maxjobs not being enforced

2019-09-18 Thread Juergen Salk
Dear Tina, probably a stupid question, but is there any other MaxJobs limit defined somewhere else above the user association in resource limit hierarchy? For example, if MaxJobs=1 in the partition/job QOS and MaxJob=100 in the user association, the QOS limit takes precedence over the user

[slurm-users] How to trigger kernel stacktraces for stuck processes from unkillable steps

2019-09-18 Thread Christopher Samuel
Hi all, At the Slurm User Group I mentioned about how to tell the kernel to dump information about stuck processes from your unkillable step script to the kernel log buffer (seen via dmesg and hopefully syslog'd somewhere useful for you). echo w > /proc/sysrq-trigger That's it.. ;-) You pr

Re: [slurm-users] Maxjobs not being enforced

2019-09-18 Thread David Rhey
Hi, Tina, Are you able to confirm whether or not you can view the limit for the user in scontrol as well? David On Tue, Sep 17, 2019 at 4:42 PM Tina Fora wrote: > > # sacctmgr modify user lif6 set maxjobs=100 > > # sacctmgr list assoc user=lif6 format=user,maxjobs,maxsubmit,maxtresmins > U

[slurm-users] Sharing a single machine between two groups; What's the best way define this in slurm config?

2019-09-18 Thread Benjamin Wong
Hello, I plan to purchase a GPU machine with 8 GPUs which will be shared between group A and group B. Group A is an existing group with SLURM nodes. Group B has no SLURM nodes but will have access to half of the resources on one SLURM node. I'm trying to figure out how to get SLURM to implement