We are pleased to announce the availability of Slurm version 17.11.9.
This includes 10 fixes made since 17.11.8 was released last month, including a fix to prevent hung srun processes that can manifest during massively parallel jobs.
Slurm can be downloaded from https://www.schedmd.com/downloads.php . - Tim -- Tim Wickberg Chief Technology Officer, SchedMD LLC Commercial Slurm Development and Support
* Changes in Slurm 17.11.9 ========================== -- Fix segfault in slurmctld when a job's node bitmap is NULL during a scheduling cycle. Primarily caused by EnforcePartLimits=ALL. -- Remove erroneous unlock in acct_gather_energy/ipmi. -- Enable support for hwloc version 2.0.1. -- Fix 'srun -q' (--qos) option handling. -- Fix socket communication issue that can lead to lost task completion messages, which will cause a permanently stuck srun process. -- Handle creation of TMPDIR if environment variable is set or changed in a task prolog script. -- Avoid node layout fragmentation if running with a fixed CPU count but without Sockets and CoresPerSocket defined. -- burst_buffer/cray - Fix datawarp swap default pool overriding jobdw. -- Fix incorrect job priority assignment for multi-partition job with different PriorityTier settings on the partitions. -- Fix sinfo to print correct node state