Re: [slurm-users] Slurm 18.08.8 --mem-per-cpu + --exclusive = strange behavior

Marcus Wagner Sun, 12 Jan 2020 22:40:00 -0800

Hi Beatrice,

we are also still on 18.08.7. But we have a similar problem here withthe billing, which is much too high (cmp. "[slurm-users] exclusive ornot exclusive, that is the question"). But Slurm > 18.08.7 exacerbatesthe problem, as those jobs don't even get scheduled :/


Best
Marcus


On 1/10/20 4:58 PM, Beatrice Charton wrote:

Hi,

Happy new Year ;-)

I just update Slurm to 18.08.9 : same behaviour. Jobs still stay PD for ever 
instead of being refused :-(
Am I the only one in this situation ?

Sincerely,

        Béatrice

Le 16 déc. 2019 à 09:49, Beatrice Charton <beatrice.char...@criann.fr> a écrit :

Hi Marcus and Bjørn-Helge

Thank you for your answers.

We don’t use slurm billing. We use system acct billing.
I also confirm that with --exclusive, there is a difference between ReqCPUS and 
AllocCPUS, but --mem-per-cpu was more a --mem-per-task than a --mem-per-cpu : 
it was associated to ReqCPUS. It looks like now it is associated to AllocCPUS.

If it’s not a side effect, why do jobs and not rejected instead of accepted and 
Pending for ever ?
The behaviour is the same in 19.05.2 but recorrected in 19.05.3 so the problem 
seems to be known in v19 but not corrected in v18.

Sincerely,

        Béatrice

Le 12 déc. 2019 à 12:10, Marcus Wagner <wag...@itc.rwth-aachen.de> a écrit :

Hi Beatrice and Bjørn-Helge,

I can sign, that it works with 18.08.7. We additionally use TRESBillingWeights 
together with PriorityFlags=MAX_TRES. For example:
TRESBillingWeights="CPU=1.0,Mem=0.1875G,gres/gpu=12.0"
We use the billing factor for our external accounting. We do this to do a fair 
accounting of the nodes. But we do have a similar effect due to --exclusive.
In Beatrice case, the billingweight would be:
TRESBillingWeights="CPU=1.0,Mem=0.21875G"
So, a 10 cpu job with 1 GB per cpu would be billed 10.
An 1 cpu job with 10 GB would be billed 2 (0.21875*10, floor).
An exclusive 10 cpu job with 1 GB per cpu would be billed 28 (all 28 cores are 
for the job).
An exclusive 1 cpu job with 30GB (Beatrice' example) would be billed 
28(cores)*30(GB)*0.21875 => 118.125 => 118 cores.

Best
Marcus

On 12/12/19 9:47 AM, Bjørn-Helge Mevik wrote:

Beatrice Charton <beatrice.char...@criann.fr> writes:

Hi,

We have a strange behaviour of Slurm after updating from 18.08.7 to
18.08.8, for jobs using --exclusive and --mem-per-cpu.

Our nodes have 128GB of memory, 28 cores.
        $ srun  --mem-per-cpu=30000 -n 1  --exclusive  hostname
=> works in 18.08.7
=> doesn’t work in 18.08.8

I'm actually surprised it _worked_ in 18.08.7.  At one time - long before
v 18.08, the behaviour was changed when using --exclusive: In order to
account the job for all cpus on the node, the number of
cpus asked for with --ntasks would simply be multiplied with with
"#cpus-on-node / --ntasks" (so in your case: 28).  Unfortunately, that
also means that the memory the job requires per node is "#cpus-on-node /
--ntasks" multiplied with --mem-per-cpu (in your case 28 * 30000 MiB ~=
820 GiB).  For this reason, we tend to ban --exclusive on our clusters
(or at least warn about it).

I haven't looked at the code for a long time, so I don't know whether
this is still the current behaviour, but every time I've tested, I've
seen the same problem.  I believe I've tested on 19.05 (but I might
remember wrong).

--
Marcus Wagner, Dipl.-Inf.

IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de

--
Béatrice CHARTON                |              CRIANN
beatrice.char...@criann.fr      |  745, avenue de l'Université
Tel : +33 (0)2 32 91 42 91      | 76800 Saint Etienne du Rouvray
       ---   Support : supp...@criann.fr   ---


--
Marcus Wagner, Dipl.-Inf.

IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de

Re: [slurm-users] Slurm 18.08.8 --mem-per-cpu + --exclusive = strange behavior

Reply via email to