Is there a direct upgrade path from 20.11.0 to 22.05.6 or is it in multiple steps?
Sid Young On Fri, Nov 11, 2022 at 7:53 AM Marshall Garey <marsh...@schedmd.com> wrote: > We are pleased to announce the availability of Slurm version 22.05.6. > > This includes a fix to core selection for steps which could result in > random task launch failures, alongside a number of other moderate > severity issues. > > - Marshall > > -- > Marshall Garey > Release Management, Support, and Development > SchedMD LLC - Commercial Slurm Development and Support > > > * Changes in Slurm 22.05.6 > > ========================== > > -- Fix a partition's DisableRootJobs=no from preventing root jobs from > working. > > -- Fix the number of allocated cpus for an auto-adjustment case in > which the > > job requests --ntasks-per-node and --mem (per-node) but the limit is > > MaxMemPerCPU. > > -- Fix POWER_DOWN_FORCE request leaving node in completing state. > > -- Do not count magnetic reservation queue records towards backfill > limits. > > -- Clarify error message when --send-libs=yes or > BcastParameters=send_libs > > fails to identify shared library files, and avoid creating an empty > > "<filename>_libs" directory on the target filesystem. > > -- Fix missing CoreSpec on dynamic nodes upon slurmctld restart. > > -- Fix node state reporting when using specialized cores. > > -- Fix number of CPUs allocated if --cpus-per-gpu used. > > -- Add flag ignore_prefer_validation to not validate --prefer on a job. > > -- Fix salloc/sbatch SLURM_TASKS_PER_NODE output environment variable > when the > > number of tasks is not requested. > > -- Permit using wildcard magic cookies with X11 forwarding. > > -- cgroup/v2 - Add check for swap when running OOM check after task > > termination. > > -- Fix deadlock caused by race condition when disabling power save with > a > > reconfigure. > > -- Fix memory leak in the dbd when container is sent to the database. > > -- openapi/dbv0.0.38 - correct dbv0.0.38_tres_info. > > -- Fix node SuspendTime, SuspendTimeout, ResumeTimeout being updated > after > > altering partition node lists with scontrol. > > -- jobcomp/elasticsearch - fix data_t memory leak after serialization. > > -- Fix issue where '*' wasn't accepted in gpu/cpu bind. > > -- Fix SLURM_GPUS_ON_NODE for shared GPU gres (MPS, shards). > > -- Add SLURM_SHARDS_ON_NODE environment variable for shards. > > -- Fix srun error with overcommit. > > -- Fix bug in core selection for the default cyclic distribution of > tasks > > across sockets, that resulted in random task launch failures. > > -- Fix core selection for steps requesting multiple tasks per core when > > allocation contains more cores than required for step. > > -- gpu/nvml - Fix MIG minor number generation when GPU minor number > > (/dev/nvidia[minor_number]) and index (as seen in nvidia-smi) do not > match. > > -- Fix accrue time underflow errors after slurmctld reconfig or restart. > > -- Surpress errant errors from prolog_complete about being unable to > locate > > "node:(null)". > > -- Fix issue where shards were selected from multiple gpus and failed to > > allocate. > > -- Fix step cpu count calculation when using --ntasks-per-gpu=. > > -- Fix overflow problems when validating array index parameters in > slurmctld > > and prevent a potential condition causing slurmctld to crash. > > -- Remove dependency on json-c in slurmctld when running with power > saving. > > Only the new "SLURM_RESUME_FILE" support relies on this, and it will > be > > disabled if json-c support is unavailable instead. > >