Dear Danny,
do you have memory specified in the nodename section of your slurm.conf file for these nodes (like in your other mail from today)? I would guess removing this could help ...
We have seen similar things with a GRES defined and trying to start several tasks via srun. The first task was consuming all GRES so that the others could not start before the first had finished. In our case setting
export SLURM_STEP_GRES=none helped. best regards, Markus On 2016-07-28 14:07, Danny Marc Rotscher wrote:
Hello, today, I upgraded Slurm from 15.08.12 to 16.05.2 and first everything looks fine. But after a while we get Tickets, that batch jobs aren’t working anymore (batch jobs with srun integrated). First I thought our patches and plugins are the evil, but a test shows a naked Slurm also produces the following error: rotscher@taurusi4003:~/hpcsupport> cat job.sh #!/bin/bash #SBATCH -p haswell srun hostname rotscher@taurusi4003:~/hpcsupport> sbatch job.sh Submitted batch job 196 rotscher@taurusi4003:~/hpcsupport> cat slurm-196.out srun: error: Unable to create job step: Memory required by task is not available Could anybody help? We have to downgrade Slurm and lost all jobs:-( Kind regards, Danny ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Danny Rotscher HPC-Support Technische Universität Dresden Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH) 01062 Dresden Tel.: +49 351 463-35853 Fax : +49 351 463-37773 E-Mail: [email protected] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-- ===================================================== Dr. Markus Stöhr Zentraler Informatikdienst BOKU Wien / TU Wien Wiedner Hauptstraße 8-10 1040 Wien Tel. +43-1-58801-420754 Fax +43-1-58801-9420754 Email: [email protected] =====================================================
