Re: [slurm-users] Slurm 18.08.8 --mem-per-cpu + --exclusive = strange behavior

Beatrice Charton Fri, 10 Jan 2020 08:00:15 -0800

Hi,

Happy new Year ;-)


I just update Slurm to 18.08.9 : same behaviour. Jobs still stay PD for ever 
instead of being refused :-(
Am I the only one in this situation ?

Sincerely,

        Béatrice


> Le 16 déc. 2019 à 09:49, Beatrice Charton <beatrice.char...@criann.fr> a 
> écrit :
> 
> Hi Marcus and Bjørn-Helge
> 
> Thank you for your answers.
> 
> We don’t use slurm billing. We use system acct billing.
> I also confirm that with --exclusive, there is a difference between ReqCPUS 
> and AllocCPUS, but --mem-per-cpu was more a --mem-per-task than a 
> --mem-per-cpu : it was associated to ReqCPUS. It looks like now it is 
> associated to AllocCPUS.
> 
> If it’s not a side effect, why do jobs and not rejected instead of accepted 
> and Pending for ever ?
> The behaviour is the same in 19.05.2 but recorrected in 19.05.3 so the 
> problem seems to be known in v19 but not corrected in v18.
> 
> Sincerely,
> 
>       Béatrice
> 
>> Le 12 déc. 2019 à 12:10, Marcus Wagner <wag...@itc.rwth-aachen.de> a écrit :
>> 
>> Hi Beatrice and Bjørn-Helge,
>> 
>> I can sign, that it works with 18.08.7. We additionally use 
>> TRESBillingWeights together with PriorityFlags=MAX_TRES. For example:
>> TRESBillingWeights="CPU=1.0,Mem=0.1875G,gres/gpu=12.0"
>> We use the billing factor for our external accounting. We do this to do a 
>> fair accounting of the nodes. But we do have a similar effect due to 
>> --exclusive.
>> In Beatrice case, the billingweight would be:
>> TRESBillingWeights="CPU=1.0,Mem=0.21875G"
>> So, a 10 cpu job with 1 GB per cpu would be billed 10.
>> An 1 cpu job with 10 GB would be billed 2 (0.21875*10, floor).
>> An exclusive 10 cpu job with 1 GB per cpu would be billed 28 (all 28 cores 
>> are for the job).
>> An exclusive 1 cpu job with 30GB (Beatrice' example) would be billed 
>> 28(cores)*30(GB)*0.21875 => 118.125 => 118 cores.
>> 
>> Best
>> Marcus
>> 
>> On 12/12/19 9:47 AM, Bjørn-Helge Mevik wrote:
>>> Beatrice Charton <beatrice.char...@criann.fr> writes:
>>> 
>>>> Hi,
>>>> 
>>>> We have a strange behaviour of Slurm after updating from 18.08.7 to
>>>> 18.08.8, for jobs using --exclusive and --mem-per-cpu.
>>>> 
>>>> Our nodes have 128GB of memory, 28 cores.
>>>>    $ srun  --mem-per-cpu=30000 -n 1  --exclusive  hostname
>>>> => works in 18.08.7
>>>> => doesn’t work in 18.08.8
>>> I'm actually surprised it _worked_ in 18.08.7.  At one time - long before
>>> v 18.08, the behaviour was changed when using --exclusive: In order to
>>> account the job for all cpus on the node, the number of
>>> cpus asked for with --ntasks would simply be multiplied with with
>>> "#cpus-on-node / --ntasks" (so in your case: 28).  Unfortunately, that
>>> also means that the memory the job requires per node is "#cpus-on-node /
>>> --ntasks" multiplied with --mem-per-cpu (in your case 28 * 30000 MiB ~=
>>> 820 GiB).  For this reason, we tend to ban --exclusive on our clusters
>>> (or at least warn about it).
>>> 
>>> I haven't looked at the code for a long time, so I don't know whether
>>> this is still the current behaviour, but every time I've tested, I've
>>> seen the same problem.  I believe I've tested on 19.05 (but I might
>>> remember wrong).
>>> 
>> 
>> -- 
>> Marcus Wagner, Dipl.-Inf.
>> 
>> IT Center
>> Abteilung: Systeme und Betrieb
>> RWTH Aachen University
>> Seffenter Weg 23
>> 52074 Aachen
>> Tel: +49 241 80-24383
>> Fax: +49 241 80-624383
>> wag...@itc.rwth-aachen.de
>> www.itc.rwth-aachen.de
>> 
>> 
> 
> -- 
> Béatrice CHARTON              |              CRIANN
> beatrice.char...@criann.fr    |  745, avenue de l'Université
> Tel : +33 (0)2 32 91 42 91    | 76800 Saint Etienne du Rouvray
>       ---   Support : supp...@criann.fr   ---
> 

-- 
Béatrice CHARTON                |              CRIANN
beatrice.char...@criann.fr      |  745, avenue de l'Université
Tel : +33 (0)2 32 91 42 91      | 76800 Saint Etienne du Rouvray
       ---   Support : supp...@criann.fr   ---

smime.p7s
Description: S/MIME cryptographic signature

Re: [slurm-users] Slurm 18.08.8 --mem-per-cpu + --exclusive = strange behavior

Reply via email to