Slurm versions 16.05.11, 17.02.9 and 17.11.0rc2 are now available, and include a series of recent bug fixes as well as a fix for a recently discovered security vulnerability (CVE-2017-15566).

Downloads are available at https://www.schedmd.com/downloads.php .

Ryan Day (LLNL) reported an issue in SPANK environment variable handling that could allow any normal user to execute code as root during the Prolog or Epilog. All systems using a Prolog or Epilog script are vulnerable, regardless of whether SPANK plugins are in use.

This issue affects all Slurm versions from 15.08.0 (August 2015) to present. This issue was reported to SchedMD on October 16th. SchedMD customers were informed on October 17th and provided a patch on request. This is in keeping with our responsible disclosure process [1].

The only mitigation, aside from installing a patched version, is to disable both Prolog and Epilog settings on your system and restart all slurmd processes.

Release notes follow below. Please note that support for the 16.05 release series ends in November as support for the upcoming 17.11 release starts, and as such 16.05.11 will be the final maintenance update for that branch.

Note that 17.11.0rc2 is the second release candidate for the 17.11 series, and is not considered a stable release suited for production use. We do encourage sites to test this out, and report issues ahead of the 17.11.0 release in November.

One last note for subscribers to the slurm-announce list: As part of a transition in our mailing list software to Mailman, the List-Id header will be changing on all future messages to:
List-Id: Slurm Announce List <slurm-announce.lists.schedmd.com>

- Tim

[1] https://www.schedmd.com/security.php

--
Tim Wickberg
Director of Support, SchedMD, LLC
Commercial Slurm Development and Support

* Changes in Slurm 17.11.0rc2
==============================
 -- Prevent slurmctld abort with NodeFeatures=knl_cray and non-KNL nodes lacking
    any configured features.
 -- The --cpu_bind and --mem_bind options have been renamed to --cpu-bind
    and --mem-bind for consistency with the rest of Slurm's options. Both
    old and new syntaxes are supported for now.
 -- Add slurmdb_connection_commit to the slurmdb api to commit when needed.
 -- Add the federation api's to the slurmdb.h file.
 -- Add job functions to the db_api.
 -- Fix sacct to always use the db_api instead of sometimes calling functions
    directly.
 -- Fix sacctmgr to always use the db_api instead of sometimes calling functions
    directly.
 -- Fix sreport to always use the db_api instead of sometimes calling functions
    directly.
 -- Make global uid to the db_api to minimize calls to getuid().
 -- Add support for HWLOC version 2.0.
 -- Added more validation logic for updates to node features.
 -- Added node_features_p_node_update_valid() function to node_features plugin.
 -- If a job is held due to bad constraints and a node's features change then
    test the job again to see if can run with the new features.
 -- Added node_features_p_changible_feature() function to node_features plugin.
 -- Avoid rebooting a node if a job's requested feature is not under the control
    of the node_features plugin and is not currently active.
 -- node_features/knl_generic plugin: Do not clear a node's non-KNL features
    specified in slurm.conf.
 -- Added SchedulerParameters configuration option "disable_hetero_steps" to
    disable job steps that span multiple components of a heterogeneous job.
    Disabled by default except with mpi/none plugin. This limitation to be
    removed in Slurm version 18.08.
 -- Fix security issue in Prolog and Epilog by always prepending SPANK_ to
    all user-set environment variables. CVE-2017-15566.

* Changes in Slurm 17.02.9
==========================
 -- When resuming powered down nodes, mark DOWN nodes right after ResumeTimeout
    has been reached (previous logic would wait about one minute longer).
 -- Fix sreport not showing full column name for TRES Count.
 -- Fix slurmdb_reservations_get() giving wrong usage data when job's spanned
    reservation that was modified.
 -- Fix sreport reservation utilization report showing bad data.
 -- Show all TRES' on a reservation in sreport reservation utilization report by
    default.
 -- Fix sacctmgr show reservation handling "end" parameter.
 -- Work around issue with sysmacros.h and gcc7 / glibc 2.25.
 -- Fix layouts code to only allow setting a boolean.
 -- Fix sbatch --wait to keep waiting even if a message timeout occurs.
 -- CRAY - If configured with NodeFeatures=knl_cray and there are non-KNL
    nodes which include no features the slurmctld will abort without
    this patch when attemping strtok_r(NULL).
 -- Fix regression in 17.02.7 which would run the spank_task_privileged as
    part of the slurmstepd instead of it's child process.
 -- Fix security issue in Prolog and Epilog by always prepending SPANK_ to
    all user-set environment variables. CVE-2017-15566.

* Changes in Slurm 16.05.11
===========================
 -- burst_buffer/cray - Add support for line continuation.
 -- If a job is cancelled by the user while it's allocated nodes are being
    reconfigured (i.e. the capmc_resume program is rebooting nodes for the job)
    and the node reconfiguration fails (i.e. the reboot fails), then don't
    requeue the job but leave it in a cancelled state.
 -- capmc_resume (Cray resume node script) - Do not disable changing a node's
    active features if SyscfgPath is configured in the knl.conf file.
 -- Fix memory error when updating a job's licenses.
 -- Fix double read lock of tres when updating gres or licenses on a job.
 -- Fix regression in 16.05.10 with respects to GrpTresMins on a QOS or
    Association.
 -- ALPS - Fix scheduling when ALPS doesn't agree with Slurm on what nodes
    are free.
 -- Fix seg fault if loading attempting to load non-existent burstbuffer plugin.
 -- Fix to backfill scheduling with respect to QOS and association limits. Jobs
    submitted to multiple partitions are most likley to be effected.
 -- Avoid erroneous errno set by the mariadb 10.2 api.
 -- Fix security issue in Prolog and Epilog by always prepending SPANK_ to
    all user-set environment variables. CVE-2017-15566.

Reply via email to