Slurm version 24.05.4 is now available and includes a fix for a recently discovered security issue with the new stepmgr subsystem.

SchedMD customers were informed on October 9th and provided a patch on
request; this process is documented in our security policy. [1]

A mistake in authentication handling in stepmgr could permit an attacker to execute processes under other users' jobs. This is limited to jobs explicitly running with --stepmgr, or on systems that have globally enabled stepmgr through "SlurmctldParameters=enable_stepmgr" in their configuration. CVE-2024-48936.

Downloads are available at https://www.schedmd.com/downloads.php .

Release notes follow below.

- Tim

[1] https://www.schedmd.com/security-policy/

--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support

* Changes in Slurm 24.05.4
==========================
 -- Fix generic int sort functions.
 -- Fix user look up using possible unrealized uid in the dbd.
 -- Fix FreeBSD compile issue with tls/none plugin.
 -- slurmrestd - Fix regressions that allowed slurmrestd to be run as SlurmUser
    when SlurmUser was not root.
 -- mpi/pmix fix race conditions with het jobs at step start/end which could
    make srun to hang.
 -- Fix not showing some SelectTypeParameters in scontrol show config.
 -- Avoid assert when dumping removed certain fields in JSON/YAML.
 -- Improve how shards are scheduled with affinity in mind.
 -- Fix MaxJobsAccruePU not being respected when MaxJobsAccruePA is set
    in the same QOS.
 -- Prevent backfill from planning jobs that use overlapping resources for the
    same time slot if the job's time limit is less than bf_resolution.
 -- Fix memory leak when requesting typed gres and --[cpus|mem]-per-gpu.
 -- Prevent backfill from breaking out due to "system state changed" every 30
    seconds if reservations use REPLACE or REPLACE_DOWN flags.
 -- slurmrestd - Make sure that scheduler_unset parameter defaults to true even
    when the following flags are also set: show_duplicates, skip_steps,
    disable_truncate_usage_time, run_away_jobs, whole_hetjob,
    disable_whole_hetjob, disable_wait_for_result, usage_time_as_submit_time,
    show_batch_script, and or show_job_environment. Additionaly, always make
    sure show_duplicates and disable_truncate_usage_time default to true when
    the following flags are also set: scheduler_unset, scheduled_on_submit,
    scheduled_by_main, scheduled_by_backfill, and or job_started. This effects
    the following endpoints:
      'GET /slurmdb/v0.0.40/jobs'
      'GET /slurmdb/v0.0.41/jobs'
 -- Ignore --json and --yaml options for scontrol show config to prevent mixing
    output types.
 -- Fix not considering nodes in reservations with Maintenance or Overlap flags
    when creating new reservations with nodecnt or when they replace down nodes.
 -- Fix suspending/resuming steps running under a 23.02 slurmstepd process.
 -- Fix options like sprio --me and squeue --me for users with a uid greater
    than 2147483647.
 -- fatal() if BlockSizes=0. This value is invalid and would otherwise cause the
    slurmctld to crash.
 -- sacctmgr - Fix issue where clearing out a preemption list using
    preempt='' would cause the given qos to no longer be preempt-able until set
    again.
 -- Fix stepmgr creating job steps concurrently.
 -- data_parser/v0.0.40 - Avoid dumping "Infinity" for NO_VAL tagged "number"
    fields.
 -- data_parser/v0.0.41 - Avoid dumping "Infinity" for NO_VAL tagged "number"
    fields.
 -- slurmctld - Fix a potential leak while updating a reservation.
 -- slurmctld - Fix state save with reservation flags when a update fails.
 -- Fix reservation update issues with parameters Accounts and Users, when
    using +/- signs.
 -- slurmrestd - Don't dump warning on empty wckeys in:
      'GET /slurmdb/v0.0.40/config'
      'GET /slurmdb/v0.0.41/config'
 -- Fix slurmd possibly leaving zombie processes on start up in configless when
    the initial attempt to fetch the config fails.
 -- Fix crash when trying to drain a non-existing node (possibly deleted
    before).
 -- slurmctld - fix segfault when calculating limit decay for jobs with an
    invalid association.
 -- Fix IPMI energy gathering with multiple sensors.
 -- data_parser/v0.0.39 - Remove xassert requiring errors and warnings to have a
    source string.
 -- slurmrestd - Prevent potential segfault when there is an error parsing an
    array field which could lead to a double xfree. This applies to several
    endpoints in data_parser v0.0.39, v0.0.40 and v0.0.41.
 -- scancel - Fix a regression from 23.11.6 where using both the --ctld and
    --sibling options would cancel the federated job on all clusters instead of
    only the cluster(s) specified by --sibling.
 -- accounting_storage/mysql - Fix bug when removing an association
    specified with an empty partition.
 -- Fix setting multiple partition state restore on a job correctly.
 -- Fix difference in behavior when swapping partition order in job submission.
 -- Fix security issue in stepmgr that could permit an attacker to execute
    processes under other users' jobs. CVE-2024-48936.

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to