Hi, Happy new Year ;-)
I just update Slurm to 18.08.9 : same behaviour. Jobs still stay PD for ever instead of being refused :-( Am I the only one in this situation ? Sincerely, Béatrice > Le 16 déc. 2019 à 09:49, Beatrice Charton <beatrice.char...@criann.fr> a > écrit : > > Hi Marcus and Bjørn-Helge > > Thank you for your answers. > > We don’t use slurm billing. We use system acct billing. > I also confirm that with --exclusive, there is a difference between ReqCPUS > and AllocCPUS, but --mem-per-cpu was more a --mem-per-task than a > --mem-per-cpu : it was associated to ReqCPUS. It looks like now it is > associated to AllocCPUS. > > If it’s not a side effect, why do jobs and not rejected instead of accepted > and Pending for ever ? > The behaviour is the same in 19.05.2 but recorrected in 19.05.3 so the > problem seems to be known in v19 but not corrected in v18. > > Sincerely, > > Béatrice > >> Le 12 déc. 2019 à 12:10, Marcus Wagner <wag...@itc.rwth-aachen.de> a écrit : >> >> Hi Beatrice and Bjørn-Helge, >> >> I can sign, that it works with 18.08.7. We additionally use >> TRESBillingWeights together with PriorityFlags=MAX_TRES. For example: >> TRESBillingWeights="CPU=1.0,Mem=0.1875G,gres/gpu=12.0" >> We use the billing factor for our external accounting. We do this to do a >> fair accounting of the nodes. But we do have a similar effect due to >> --exclusive. >> In Beatrice case, the billingweight would be: >> TRESBillingWeights="CPU=1.0,Mem=0.21875G" >> So, a 10 cpu job with 1 GB per cpu would be billed 10. >> An 1 cpu job with 10 GB would be billed 2 (0.21875*10, floor). >> An exclusive 10 cpu job with 1 GB per cpu would be billed 28 (all 28 cores >> are for the job). >> An exclusive 1 cpu job with 30GB (Beatrice' example) would be billed >> 28(cores)*30(GB)*0.21875 => 118.125 => 118 cores. >> >> Best >> Marcus >> >> On 12/12/19 9:47 AM, Bjørn-Helge Mevik wrote: >>> Beatrice Charton <beatrice.char...@criann.fr> writes: >>> >>>> Hi, >>>> >>>> We have a strange behaviour of Slurm after updating from 18.08.7 to >>>> 18.08.8, for jobs using --exclusive and --mem-per-cpu. >>>> >>>> Our nodes have 128GB of memory, 28 cores. >>>> $ srun --mem-per-cpu=30000 -n 1 --exclusive hostname >>>> => works in 18.08.7 >>>> => doesn’t work in 18.08.8 >>> I'm actually surprised it _worked_ in 18.08.7. At one time - long before >>> v 18.08, the behaviour was changed when using --exclusive: In order to >>> account the job for all cpus on the node, the number of >>> cpus asked for with --ntasks would simply be multiplied with with >>> "#cpus-on-node / --ntasks" (so in your case: 28). Unfortunately, that >>> also means that the memory the job requires per node is "#cpus-on-node / >>> --ntasks" multiplied with --mem-per-cpu (in your case 28 * 30000 MiB ~= >>> 820 GiB). For this reason, we tend to ban --exclusive on our clusters >>> (or at least warn about it). >>> >>> I haven't looked at the code for a long time, so I don't know whether >>> this is still the current behaviour, but every time I've tested, I've >>> seen the same problem. I believe I've tested on 19.05 (but I might >>> remember wrong). >>> >> >> -- >> Marcus Wagner, Dipl.-Inf. >> >> IT Center >> Abteilung: Systeme und Betrieb >> RWTH Aachen University >> Seffenter Weg 23 >> 52074 Aachen >> Tel: +49 241 80-24383 >> Fax: +49 241 80-624383 >> wag...@itc.rwth-aachen.de >> www.itc.rwth-aachen.de >> >> > > -- > Béatrice CHARTON | CRIANN > beatrice.char...@criann.fr | 745, avenue de l'Université > Tel : +33 (0)2 32 91 42 91 | 76800 Saint Etienne du Rouvray > --- Support : supp...@criann.fr --- > -- Béatrice CHARTON | CRIANN beatrice.char...@criann.fr | 745, avenue de l'Université Tel : +33 (0)2 32 91 42 91 | 76800 Saint Etienne du Rouvray --- Support : supp...@criann.fr ---
smime.p7s
Description: S/MIME cryptographic signature