Dear Danny,

do you have memory specified in the nodename section of your slurm.conf file for these nodes (like in your other mail from today)? I would guess removing this could help ...


We have seen similar things with a GRES defined and trying to start several tasks via srun. The first task was consuming all GRES so that the others could not start before the first had finished. In our case setting

export SLURM_STEP_GRES=none

helped.



best regards,
Markus


On 2016-07-28 14:07, Danny Marc Rotscher wrote:
Hello,

today, I upgraded Slurm from 15.08.12 to 16.05.2 and first everything looks 
fine.
But after a while we get Tickets, that batch jobs aren’t working anymore (batch 
jobs with srun integrated).
First I thought our patches and plugins are the evil, but a test shows a naked 
Slurm also produces the following error:

rotscher@taurusi4003:~/hpcsupport> cat job.sh
#!/bin/bash
#SBATCH -p haswell

srun hostname
rotscher@taurusi4003:~/hpcsupport> sbatch job.sh
Submitted batch job 196
rotscher@taurusi4003:~/hpcsupport> cat slurm-196.out
srun: error: Unable to create job step: Memory required by task is not available

Could anybody help?
We have to downgrade Slurm and lost all jobs:-(

Kind regards,
Danny
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Danny Rotscher
HPC-Support

Technische Universität Dresden
Zentrum für Informationsdienste
und Hochleistungsrechnen (ZIH)
01062 Dresden
Tel.: +49 351 463-35853
Fax : +49 351 463-37773
E-Mail: [email protected]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


--
=====================================================
Dr. Markus Stöhr
Zentraler Informatikdienst BOKU Wien / TU Wien
Wiedner Hauptstraße 8-10
1040 Wien

Tel. +43-1-58801-420754
Fax  +43-1-58801-9420754

Email: [email protected]
=====================================================

Reply via email to