Re: [slurm-users] Job canceled after reaching QOS limits for CPU time.

2020-10-30 Thread Zacarias Benta
And also the DMTCP project. On 30/10/2020 14:10, Thomas M. Payerle wrote: On Fri, Oct 30, 2020 at 5:37 AM Loris Bennett mailto:loris.benn...@fu-berlin.de>> wrote: Hi Zacarias, Zacarias Benta mailto:zacar...@lip.pt>> writes: > Good morning everyone. > > I'm having a "is

Re: [slurm-users] Job canceled after reaching QOS limits for CPU time.

2020-10-30 Thread Zacarias Benta
Thanks Tom, You are right it is suspend and not pendind that I would like the job state to go into. I'll take a look into the *OverTimeLimit *flag and see if it helps.* * On 30/10/2020 14:10, Thomas M. Payerle wrote: On Fri, Oct 30, 2020 at 5:37 AM Loris Bennett mailto:loris.benn...@fu-b

Re: [slurm-users] Job canceled after reaching QOS limits for CPU time.

2020-10-30 Thread Diego Zuccato
Il 30/10/20 14:38, Zacarias Benta ha scritto: > I know it sound kind o silly giving a limit and at the same time > allowing for exceptions, but we are trying to prevent the waste of > valuable cpu time. Then convince your users to use checkpointing. Then use shorter run times (we have 24h for 'nor

Re: [slurm-users] Job canceled after reaching QOS limits for CPU time.

2020-10-30 Thread Thomas M. Payerle
On Fri, Oct 30, 2020 at 5:37 AM Loris Bennett wrote: > Hi Zacarias, > > Zacarias Benta writes: > > > Good morning everyone. > > > > I'm having a "issue", I don't know if it is a "bug or a feature". > > I've created a QOS: "sacctmgr add qos myqos set GrpTRESMins=cpu=10 > > flags=NoDecay". I know

Re: [slurm-users] Job canceled after reaching QOS limits for CPU time.

2020-10-30 Thread Zacarias Benta
Hi Loris, Thanks for taking the time to reply to my message. We are indeed wanting to limit and not limit at the same time, I know that it is kind of tricky, but let me try to explain. Our hpc center currently limits jobs from running for more than 5 days straight when users submit single core

Re: [slurm-users] Job canceled after reaching QOS limits for CPU time.

2020-10-30 Thread Loris Bennett
Hi Zacarias, Zacarias Benta writes: > Good morning everyone. > > I'm having a "issue", I don't know if it is a "bug or a feature". > I've created a QOS: "sacctmgr add qos myqos set GrpTRESMins=cpu=10 > flags=NoDecay". I know the limit it too low, but I just wanted to > give you guys an example.

[slurm-users] Job canceled after reaching QOS limits for CPU time.

2020-10-29 Thread Zacarias Benta
Good morning everyone. I'm having a "issue", I don't know if it is a "bug or a feature". I've created a QOS:  "sacctmgr add qos myqos set GrpTRESMins=cpu=10 flags=NoDecay". I know the limit it too low, but I just wanted to give you guys an example. Whenever a user submits a job and uses this Q