[slurm-users] Slurm version 22.05.3 is now available

Tim Wickberg Thu, 11 Aug 2022 14:02:00 -0700

We are pleased to announce the availability of Slurm version 22.05.3.

This release includes a number of low to moderate severity fixes madesince the last maintenance release was made in June.


Slurm can be downloaded from https://www.schedmd.com/downloads.php .

- Tim

--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support

* Changes in Slurm 22.05.3
==========================
 -- job_container/tmpfs - cleanup containers even when the .ns file isn't
    mounted anymore.
 -- Ignore the bf_licenses option if using sched/builtin.
 -- Do not clear the job's requested QOS (qos_id) when ineligible due to QOS.
 -- Emit error and add fail-safe when job's qos_id changes unexpectedly.
 -- Fix timeout value in log.
 -- openapi/v0.0.38 - fix setting of DefaultTime when dumping a partition.
 -- openapi/dbv0.0.38 - correct parsing association QOS field.
 -- Fix LaunchParameters=mpir_use_nodeaddr.
 -- Fix various edge cases where accrue limits could be exceeded or cause
    underflow error messages.
 -- Fix issue where a job requesting --ntasks and --nodes could be wrongly
    rejected when spanning heterogeneous nodes.
 -- openapi/v0.0.38 - detect when partition PreemptMode is disabled
 -- openapi/v0.0.38 - add QOS flag to handle partition PreemptMode=within
 -- Add total_cpus and total_nodes values to the partition list in
    the job_submit/lua plugin.
 -- openapi/dbv0.0.38 - reject and error on invalid flag values in well defined
    flag fields.
 -- openapi/dbv0.0.38 - correct QOS preempt_mode flag requests being silently
    ignored.
 -- accounting_storage/mysql - allow QOS preempt_mode flag updates when GANG
    mode is requested.
 -- openapi/dbv0.0.38 - correct QOS flag modifications request being silently
    ignored.
 -- sacct/sinfo/squeue - use openapi/[db]v0.0.38 for --json and --yaml modes.
 -- Improve error messages when using configless and fetching the config fails.
 -- Fix segfault when reboot_from_controller is configured and scontrol reboot
    is used.
 -- Fix regression which prevented a cons_tres gpu job to be submitted to a
    cons_tres cluster from a non-con_tres cluster.
 -- openapi/dbv0.0.38 - correct association QOS list parsing for updates.
 -- Fix rollup incorrectly divying up unused reservation time between
    associations.
 -- slurmrestd - add SLURMRESTD_SECURITY=disable_unshare_files environment
    variable.
 -- Update rsmi detection to handle new default library location.
 -- Fix header inclusion from slurmstepd manager code leading to multiple
    definition errors when linking --without-shared-libslurm.
 -- slurm.spec - explicitly disable Link Time Optimization (LTO) to avoid
    linking errors on systems where LTO-related RPM macros are enabled by
    default and the binutils version has a bug.
 -- Fix issue in the api/step_io message writing logic leading to incorrect
    behavior in API consuming clients like srun or sattach, including a segfault
    when freeing IO buffers holding traffic from the tasks to the client.
 -- openapi/dbv0.0.38 - avoid job queries getting rejected when cluster is not
    provided by client.
 -- openapi/dbv0.0.38 - accept job state filter as verbose names instead of
    only numeric state ids.
 -- Fix regression in 22.05.0rc1: if slurmd shuts down while a prolog is
    running, the job is cancelled and the node is drained.
 -- Wait up to PrologEpilogTimeout before shutting down slurmd to allow prolog
    and epilog scripts to complete or timeout. Previously, slurmd waited 120
    seconds before timing out and killing prolog and epilog scripts.
 -- GPU - Fix checking frequencies to check them all and not skip the last one.
 -- GPU - Fix logic to set frequencies properly when handling multiple GPUs.
 -- cgroup/v2 - Fix typo in error message.
 -- cgroup/v2 - More robust pattern search for events.
 -- Fix slurm_spank_job_[prolog|epilog] failures being masked if a Prolog or
    Epilog script is defined (regression in 22.05.0rc1).
 -- When a job requested nodes and can't immediately start, only report to
    the user (squeue/scontrol et al) if nodes are down in the requested list.
 -- openapi/dbv0.0.38 - Fix qos list/preempt not being parsed correctly.
 -- Fix dynamic nodes registrations mapping previously assigned nodes.
 -- Remove unnecessarily limit on count of 'shared' gres.
 -- Fix shared gres on CLOUD nodes not properly initializing.

[slurm-users] Slurm version 22.05.3 is now available

Reply via email to