Slurm versions 23.11.1, 23.02.7, 22.05.11 are now available and address a number of recently-discovered security issues. They've been assigned CVE-2023-49933 through CVE-2023-49938.

SchedMD customers were informed on November 29th and provided a patch on request; this process is documented in our security policy. [1]

There are no mitigations available for these issues; the only option is to patch and restart the affected daemons.

--------

Five issues were reported by Ryan Hall (Meta Red Team X):

1) Slurmd Message Integrity Bypass. (Slurm 23.02 and 23.11.)
   CVE-2023-49935

Permits an attacker to reuse root-level authentication tokens when interacting with the slurmd process, bypassing the RPC message hashes which protect against malicious MUNGE credential reuse.

2) Slurm Arbitrary File Overwrite. (Slurm 22.05 and 23.02.)
   CVE-2023-49938

Permits an attacker to modified their extended group list used with the sbcast subsystem, and open files with an incorrect set of extended groups.

3) Slurm NULL Pointer Dereference. (Slurm 22.05, 23.02, 23.11.)
   CVE-2023-49936

Denial of service.

4) Slurm Protocol Double Free. (Slurm 22.05, 23.02, 23.11.)
   CVE-2023-49937

Denial of service, potential for arbitrary code execution.

5) Slurm Protocol Message Extension. (Slurm 22.05, 23.02, 23.11.)
   CVE-2023-49933

Allows for malicious modification of RPC traffic that bypasses the message hash checks.

A sixth issue was discovered internally by SchedMD:

6) SQL Injection. (Slurm 23.11.)
   CVE-2023-49934

Arbitrary SQL injection against SlurmDBD's SQL database.

--------

SchedMD only issues security fixes for the supported releases (currently 23.11, 23.02 and 22.05). Due to the complexity of these fixes, we do not recommend attempting to back-port the fixes to older releases, and strongly encourage sites to upgrade to fixed versions immediately.

Downloads are available at https://www.schedmd.com/downloads.php .

Release notes follow below.

- Tim

[1] https://www.schedmd.com/security.php

--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support

* Changes in Slurm 23.11.1
==========================
 -- Fix scontrol update job=... TimeLimit+=/-= when used with a raw JobId of job
    array element.
 -- Reject TimeLimit increment/decrement when called on job with
    TimeLimit=UNLIMITED.
 -- Fix slurmctld segfault when reconfiguring after a job resize.
 -- Fix compilation on FreeBSD.
 -- Fix issue with requesting a job with --licenses as well as
    --tres-per-task=license.
 -- slurmctld - Prevent segfault in getopt_long() with an invalid long option.
 -- Switch to man2html-base in Build-Depends for Debian package.
 -- slurmrestd - Added /meta/slurm/cluster field to responses.
 -- Adjust systemd service files to start daemons after remote-fs.target.
 -- Add "--with selinux" option to slurm.spec.
 -- Fix task/cgroup indexing tasks in cgroup plugins, which caused
    jobacct/gather to match the gathered stats with the wrong task id.
 -- select/linear - Fix regression in 23.11 in which jobs that requested
    --cpus-per-task were rejected.
 -- Fix crash in slurmstepd that can occur when launching tasks via mpi using
    the pmi2 plugin and using the route/topology plugin.
 -- Fix sgather not gathering from all nodes when using CR_PACK_NODES/--m pack.
 -- Fix mysql query syntax error when getting jobs with private data.
 -- Fix sanity check to prevent deleting default account of users.
 -- data_parser/v0.0.40 - Fix the parsing for /slurmdb/v0.0.40/jobs exit_code
    query parameter.
 -- Fix issue where TRES for energy wasn't always set before sending it to the
    jobcomp plugin.
 -- jobcomp/[kafka|elastisearch] Print raw TRES values along with the
    formatted versions as tres_[req|alloc]_raw.
 -- Fix inconsistencies with --cpu-bind/SLURM_CPU_BIND and --hint/SLURM_HINT.
 -- Fix ignoring invalid json in various subsystems.
 -- Remove shebang from bash completion script.
 -- Fix elapsed time in JobComp being set from invalid start and end times.
 -- Update service files to start slurmd, slurmctld, and slurmdbd after sssd.
 -- data_parser/v0.0.40 - Fix output of DefMemPerCpu, MaxMemPerCpu, and
    max_shares.
 -- When determining a jobs index in the database don't wait if there are more
    jobs waiting.
 -- If a job requests more shards which would allocate more than one sharing
    GRES (gpu) per node refuse it unless SelectTypeparameters has
    MULTIPLE_SHARING_GRES_PJ.
 -- Avoid refreshing the hwloc xml file when slurmd is reconfigured. This fixes
    an issue seen with CoreSpecCount used on nodes with Intel E-cores.
 -- Trigger fatal exit when Slurm API function is called before slurm_init() is
    called.
 -- slurmd - Fix issue with 'scontrol reconfigure' when started with '-c'.
 -- data_parser/v0.0.40 - Fix handling of negative job nice values.
 -- data_parser/v0.0.40 - Fill the "id" object for associations with the
    cluster, account, partition, and user in addition to the assoc id.
 -- data_parser/v0.0.40 - Remove unusable cpu_binding_flags enums from
    v00.0.40_job_desc_msg.
 -- Improve performance and resiliency of slurmscriptd shutdown on
    'scontrol reconfigure'.
 -- slurmrestd - Job submissions that result in the following error codes
    will be considered as successfully submitted (with a warning), instead
    of returning an HTTP 500 error back:
    ESLURM_NODES_BUSY, ESLURM_RESERVATION_BUSY, ESLURM_JOB_HELD,
    ESLURM_NODE_NOT_AVAIL, ESLURM_QOS_THRES, ESLURM_ACCOUNTING_POLICY,
    ESLURM_RESERVATION_NOT_USABLE, ESLURM_REQUESTED_PART_CONFIG_UNAVAILABLE,
    ESLURM_BURST_BUFFER_WAIT, ESLURM_PARTITION_DOWN,
    ESLURM_LICENSES_UNAVAILABLE.
 -- Fix issue with node appearing to reboot on every "scontrol reconfigure"
    when slurmd was started with the '-b' flag.
 -- Fix a slurmctld fatal error when upgrading to 23.11 and changing from
    select/cons_res to select/cons_tres at the same time.
 -- slurmctld - Fix subsequent reconfigure hanging after a failed reconfigure.
 -- slurmctld - Reject arbitrary distribution jobs that have a minimum node
    count that differs from the number of unique nodes in the hostlist.
 -- Prevent slurmdbd errors when updating reservations with names containing
    apostrophes.
 -- Prevent message extension attacks that could bypass the message hash.
    CVE-2023-49933.
 -- Prevent SQL injection attacks in slurmdbd. CVE-2023-49934.
 -- Prevent message hash bypass in slurmd which can allow an attacker to reuse
    root-level MUNGE tokens and escalate permissions. CVE-2023-49935.
 -- Prevent NULL pointer dereference on size_valp overflow. CVE-2023-49936.
 -- Prevent double-xfree() on error in _unpack_node_reg_resp().
    CVE-2023-49937.

* Changes in Slurm 23.02.7
==========================
 -- libslurm_nss - Avoid causing glibc to assert due to an unexpected return
    from slurm_nss due to an error during lookup.
 -- Fix job requests with --tres-per-task sometimes resulting in bad allocations
    that cannot run subsequent job steps.
 -- Fix issue with slurmd where srun fails to be warned when a node prolog
    script runs beyond MsgTimeout set in slurm.conf.
 -- gres/shard - Fix plugin functions to have matching parameter orders.
 -- gpu/nvml - Fix issue that resulted in the wrong MIG devices being
    constrained to a job
 -- gpu/nvml - Fix linking issue with MIGs that prevented multiple MIGs being
    used in a single job for certain MIG configurations
 -- Add JobAcctGatherParams=DisableGPUAcct to disable gpu accounting.
 -- Fix file descriptor leak in slurmd when using acct_gather_energy/ipmi with
    DCMI devices.
 -- sview - avoid crash when job has a node list string > 49 characters.
 -- Prevent slurmctld crash during reconfigure when packing job start messages.
 -- Preserve reason uid on reconfig.
 -- Update node reason with updated INVAL state reason if different from last
    registration.
 -- acct_gather_energy/ipmi - Improve logging of DCMI issues.
 -- conmgr - Avoid NULL dereference when using auth/none.
 -- data_parser/v0.0.39 - Fixed how deleted QOS and associations for jobs are
    dumped.
 -- burst_buffer/lua - fix stage in counter not decrementing when a job is
    cancelled during stage in. This counter is used to enforce the limit of 128
    scripts per stage.
 -- gpu/oneapi - Add support for new env vars ZE_FLAT_DEVICE_HIERARCHY and
    ZE_ENABLE_PCI_ID_DEVICE_ORDER.
 -- data_parser/v0.0.39 - Fix how the "INVALID" nodes state is dumped.
 -- data_parser/v0.0.39 - Fix parsing of flag arrays to allow muliple flags to
    be set.
 -- Avoid leaking sockets when an x11 application is closed in an allocation.
 -- Fix missing mutex unlock in group cache code which could cause slurmctld to
    freeze.
 -- Fix scrontab monthly jobs possibly skipping a month if added near the end of
    the month.
 -- Fix loading of the gpu account gather energy plugin.
 -- Fix slurmctld segfault when reconfiguring after a job resize.
 -- Fix crash in slurmstepd that can occur when launching tasks via mpi using
    the pmi2 plugin and using the route/topology plugin.
 -- data_parser/v0.0.39 - skip empty string when parsing QOS ids.
 -- Fix "qos <id> doesn't exist" error message in assoc_mgr_update_assocs to
    print the attempted new default qos, rather than the current default qos.
 -- Remove error message from assoc_mgr_update_assocs when purposefully
    resetting the default qos.
 -- data_parser/v0.0.39 - Fix segfault when POSTing data with association usage.
 -- Prevent message extension attacks that could bypass the message hash.
    CVE-2023-49933.
 -- Prevent message hash bypass in slurmd which can allow an attacker to reuse
    root-level MUNGE tokens and escalate permissions. CVE-2023-49935.
 -- Prevent NULL pointer dereference on size_valp overflow. CVE-2023-49936.
 -- Prevent double-xfree() on error in _unpack_node_reg_resp(). CVE-2023-49937.
 -- Prevent modified sbcast RPCs from opening a file with the wrong group
    permissions. CVE-2023-49938.

* Changes in Slurm 22.05.11
===========================
 -- Prevent message extension attacks that could bypass the message hash.
    CVE-2023-49933.
 -- Prevent NULL pointer dereference on size_valp overflow. CVE-2023-49936.
 -- Prevent double-xfree() on error in _unpack_node_reg_resp().
    CVE-2023-49937.
 -- Prevent modified sbcast RPCs from opening a file with the wrong group
    permissions. CVE-2023-49938.

Reply via email to