Re: [slurm-users] srun: error: io_init_msg_unpack: unpack error

2022-08-08 Thread Ole Holm Nielsen
On 09-08-2022 01:11, David Magda wrote: On Aug 6, 2022, at 15:13, Chris Samuel wrote: It's also safe to restart slurmd's with running jobs, though you may want to drain them before that so slurmctld won't try and send them a job in the middle. My testing has shown that this is not the case:

Re: [slurm-users] srun: error: io_init_msg_unpack: unpack error

2022-08-08 Thread David Magda
On Aug 6, 2022, at 15:13, Chris Samuel wrote: > > On 6/8/22 10:43 am, David Magda wrote: > >> It seems that the the new srun(1) cannot talk to the old slurmd(8). >> Is this 'on purpose'? Does the backwards compatibility of the protocol not >> extend to srun(1)? > > That's expected, what you're

Re: [slurm-users] Suspend without gang scheduling

2022-08-08 Thread Reed Dier
Following up with a bit more specific color as to what I’m seeing, as well as a solution that I’m ashamed I didn’t come back to it. If there is exclusively tier3 work queued up, gang scheduling never comes into play. If there is tier3+tier1 work queued up, tier1 gets requeued, and tier3 preempt

Re: [slurm-users] Node status (without repeats)

2022-08-08 Thread Brian Andrus
It looks to me like you have the same node in multiple partitions. If the output you are getting is basically what you want just pipe it to 'sort -u' or 'uniq' Brian Andrus On 8/8/2022 10:14 AM, Borchert, Christopher B ERDC-RDE-ITL-MS CIV wrote: Hello. How can I simply show the status of a no

Re: [slurm-users] Node status (without repeats)

2022-08-08 Thread Ole Holm Nielsen
On 08-08-2022 19:14, Borchert, Christopher B ERDC-RDE-ITL-MS CIV wrote: Hello. How can I simply show the status of a node in Slurm? It repeats the same output per partition even when the partition column isn't included. $ sinfo -N -o '%N %a %t' NODELIST AVAIL STATE roy-r1-cp15b up idle roy-r1-cp

[slurm-users] Suspend without gang scheduling

2022-08-08 Thread Reed Dier
I’ve got essentially 3 “tiers” of jobs. tier1 are stateless and can be requeued tier2 are stateful and can be suspended tier3 are “high priority” and can preempt tier1 and tier2 with the requisite preemption modes. > $ sacctmgr show qos format=name%10,priority%10,preempt%12,preemptmode%10 >

Re: [slurm-users] Allow regular users to make reservations

2022-08-08 Thread Renfro, Michael
Going in a completely different direction than you’d planned, but for the same goal, what about making a script (shell, Python, or otherwise) that could validate all the constraints and call the scontrol program if appropriate, and then run that script via “sudo” as one of the regular users? Fr

[slurm-users] odd binding interaction with hint=nomultithread

2022-08-08 Thread Henderson, Brent
I've hit an issue with binding using slurm 21.08.5 that I'm hoping someone might be able to help with. I took a scan through the e-mail list but didn't see this one - apologies if I missed it. Maybe I just need a better understanding on why this is happening but feels like a bug. The issue is

[slurm-users] Allow regular users to make reservations

2022-08-08 Thread Paolo Viviani
Hello, I’m planning to develop a plugin for SLURM that would allow regular users to create reservations respecting some specific constraints on time/resources requested. Do you think it would be feasible to implement it as a plugin, or that would necessarily require modification of SLURM code? I

Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

2022-08-08 Thread gerard . gil
Hello Miguel, Setting the limit to only one QOS works indeed but it prevents usage of several QOS for all users, and all the multi QOS possibilities. I'm thinking about how I can manage with it and if it's possible to set up a workaround in our environment. Thanks for all your help. Cordi