Slurm versions 20.02.3 and 19.05.7 are now available, and include a series of recent bug fixes, as well as a fix for a security issue with the optional message aggregation feature.

SchedMD customers were informed on May 7th and provided a patch on request; this process is documented in our security policy [1].

CVE-2020-12693:

A review of what was intended to be a minor cleanup patch uncovered an underlying race condition for systems with Message Aggregation enabled. This race condition could allow a user to launch a process as an arbitrary user.

This is only an issue for systems with Message Aggregation enabled, which we expect to be a small number of Slurm installations in practice.

Message Aggregation is off in Slurm by default, and is only enabled by MsgAggregationParams=WindowMsgs=<msgs>, where <msgs> is greater than 1. (Using Message Aggregation on your systems is not a recommended configuration at this time, and we may retire this subsystem in a future Slurm release in favor of other RPC aggregation techniques. Although care must be taken before disabling this to avoid communication issues.)

Downloads are available at https://www.schedmd.com/downloads.php .

Release notes follow below.

- Tim

[1] https://www.schedmd.com/security.php

--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support

* Changes in Slurm 20.02.3
==========================
 -- Factor in ntasks-per-core=1 with cons_tres.
 -- Fix formatting in error message in cons_tres.
 -- Fix calling stat on a NULL variable.
 -- Fix minor memory leak when using reservations with flags=first_cores.
 -- Fix gpu bind issue when CPUs=Cores and ThreadsPerCore > 1 on a node.
 -- Fix --mem-per-gpu for heterogenous --gres requests.
 -- Fix slurmctld load order in load_all_part_state().
 -- Fix race condition not finding jobacct gather task cgroup entry.
 -- Suppress error message when selecting nodes on disjoint topologies.
 -- Improve performance of _pack_default_job_details() with large number of job
    arguments.
 -- Fix archive loading previous to 17.11 jobs per-node req_mem.
 -- Fix regresion validating that --gpus-per-socket requires --sockets-per-node
    for steps. Should only validate allocation requests.
 -- error() instead of fatal() when parsing an invalid hostlist.
 -- nss_slurm - fix potential deadlock in slurmstepd on overloaded systems.
 -- cons_tres - fix --gres-flags=enforce-binding and related --cpus-per-gres.
 -- cons_tres - Allocate lowest numbered cores when filtering cores with gres.
 -- Fix getting system counts for named GRES/TRES.
 -- MySQL - Fix for handing typed GRES for association rollups.
 -- Fix step allocations when tasks_per_core > 1.
 -- Fix allocating more GRES than requested when asking for multiple GRES types.

* Changes in Slurm 19.05.7
==========================
 -- Fix handling of -m/--distribution options for across socket/2nd level by
    task/affinity plugin.
 -- Fix grp_node_bitmap error when slurmctld started before slurmdbd.
 -- Fix compilation issues in GCC10.
 -- Fix distributing job steps across idle nodes within a job.
 -- Break infinite loop in cons_tres dealing with incorrect tasks per tres
    request resulting in slurmctld hang.
 -- priority/multifactor - gracefully handle NULL list of associations or array
    of siblings when calculating FairTree fairshare.
 -- Fix cons_tres --exclusive=user to allocate only requested number of CPUs.
 -- Add MySQL deadlock detection and automatic retry mechanism.
 -- Fix _verify_node_state memory requested as --mem-per-gpu DefMemPerGPU.
 -- Factor in ntasks-per-core=1 with cons_tres.
 -- Fix formatting in error message in cons_tres.
 -- Fix gpu bind issue when CPUs=Cores and ThreadsPerCore > 1 on a node.
 -- Fix --mem-per-gpu for heterogenous --gres requests.
 -- Fix slurmctld load order in load_all_part_state().
 -- Fix getting system counts for named GRES/TRES.
 -- MySQL - Fix for handing typed GRES for association rollups.
 -- Fix step allocations when tasks_per_core > 1.

Reply via email to