We are pleased to announce the availability of Slurm version 22.05.3.
This release includes a number of low to moderate severity fixes made since the last maintenance release was made in June.
Slurm can be downloaded from https://www.schedmd.com/downloads.php . - Tim -- Tim Wickberg Chief Technology Officer, SchedMD LLC Commercial Slurm Development and Support
* Changes in Slurm 22.05.3 ========================== -- job_container/tmpfs - cleanup containers even when the .ns file isn't mounted anymore. -- Ignore the bf_licenses option if using sched/builtin. -- Do not clear the job's requested QOS (qos_id) when ineligible due to QOS. -- Emit error and add fail-safe when job's qos_id changes unexpectedly. -- Fix timeout value in log. -- openapi/v0.0.38 - fix setting of DefaultTime when dumping a partition. -- openapi/dbv0.0.38 - correct parsing association QOS field. -- Fix LaunchParameters=mpir_use_nodeaddr. -- Fix various edge cases where accrue limits could be exceeded or cause underflow error messages. -- Fix issue where a job requesting --ntasks and --nodes could be wrongly rejected when spanning heterogeneous nodes. -- openapi/v0.0.38 - detect when partition PreemptMode is disabled -- openapi/v0.0.38 - add QOS flag to handle partition PreemptMode=within -- Add total_cpus and total_nodes values to the partition list in the job_submit/lua plugin. -- openapi/dbv0.0.38 - reject and error on invalid flag values in well defined flag fields. -- openapi/dbv0.0.38 - correct QOS preempt_mode flag requests being silently ignored. -- accounting_storage/mysql - allow QOS preempt_mode flag updates when GANG mode is requested. -- openapi/dbv0.0.38 - correct QOS flag modifications request being silently ignored. -- sacct/sinfo/squeue - use openapi/[db]v0.0.38 for --json and --yaml modes. -- Improve error messages when using configless and fetching the config fails. -- Fix segfault when reboot_from_controller is configured and scontrol reboot is used. -- Fix regression which prevented a cons_tres gpu job to be submitted to a cons_tres cluster from a non-con_tres cluster. -- openapi/dbv0.0.38 - correct association QOS list parsing for updates. -- Fix rollup incorrectly divying up unused reservation time between associations. -- slurmrestd - add SLURMRESTD_SECURITY=disable_unshare_files environment variable. -- Update rsmi detection to handle new default library location. -- Fix header inclusion from slurmstepd manager code leading to multiple definition errors when linking --without-shared-libslurm. -- slurm.spec - explicitly disable Link Time Optimization (LTO) to avoid linking errors on systems where LTO-related RPM macros are enabled by default and the binutils version has a bug. -- Fix issue in the api/step_io message writing logic leading to incorrect behavior in API consuming clients like srun or sattach, including a segfault when freeing IO buffers holding traffic from the tasks to the client. -- openapi/dbv0.0.38 - avoid job queries getting rejected when cluster is not provided by client. -- openapi/dbv0.0.38 - accept job state filter as verbose names instead of only numeric state ids. -- Fix regression in 22.05.0rc1: if slurmd shuts down while a prolog is running, the job is cancelled and the node is drained. -- Wait up to PrologEpilogTimeout before shutting down slurmd to allow prolog and epilog scripts to complete or timeout. Previously, slurmd waited 120 seconds before timing out and killing prolog and epilog scripts. -- GPU - Fix checking frequencies to check them all and not skip the last one. -- GPU - Fix logic to set frequencies properly when handling multiple GPUs. -- cgroup/v2 - Fix typo in error message. -- cgroup/v2 - More robust pattern search for events. -- Fix slurm_spank_job_[prolog|epilog] failures being masked if a Prolog or Epilog script is defined (regression in 22.05.0rc1). -- When a job requested nodes and can't immediately start, only report to the user (squeue/scontrol et al) if nodes are down in the requested list. -- openapi/dbv0.0.38 - Fix qos list/preempt not being parsed correctly. -- Fix dynamic nodes registrations mapping previously assigned nodes. -- Remove unnecessarily limit on count of 'shared' gres. -- Fix shared gres on CLOUD nodes not properly initializing.