Slurm version 14.11.8 includes about 30 relatively minor bug fixes
developed over the past seven weeks while version 15.08.0-pre6
contains new development scheduled for release next month. Details of
changes are shown below. Slurm downloads are available from
http://www.schedmd.com/#repos
Also note that registration is open for the 2015 Slurm User Group
Meeting. A preliminary agenda, registration and hotel information are
available from
http://slurm.schedmd.com/slurm_ug_agenda.html
* Changes in Slurm 14.11.8
==========================
-- Eliminate need for user to set user_id on job_update calls.
-- Correct list of unavailable nodes reported in a job's "reason" field when
that job can not start.
-- Map job --mem-per-cpu=0 option to --mem=0.
-- Fix squeue -o %m and %d unit conversion to Megabytes.
-- Fix issue with incorrect time calculation in the priority plugin when
a job runs past it's time limit.
-- Prevent users from setting job's partition to an invalid partition.
-- Fix sreport core dump when requesting
'job SizesByAccount grouping=individual'.
-- select/linear: Correct count of CPUs allocated to job on system with
hyperthreads.
-- Fix race condition where last array task might not get updated in the db.
-- CRAY - Remove libpmi from rpm install
-- Fix squeue -o %X output to correctly handle NO_VAL and suffix.
-- When deleting a job from the system set the job_id to 0 to avoid memory
corruption if thread uses the pointer basing validity off the id.
-- Fix issue where sbatch would set ntasks-per-node to 0 making any srun
afterward cause a divide by zero error.
-- switch/cray: Refine logic to set PMI_CRAY_NO_SMP_ENV environment variable.
-- When sacctmgr loads archives with version less than 14.11 set the array
task id to NO_VAL, so sacct can display the job ids correctly.
-- When using memory cgroup if a task uses more memory than requested
the failures are logged into memory.failcnt count file by cgroup
and the user is notified by slurmstepd about it.
-- Fix scheduling inconsistency with GRES bound to specific CPUs.
-- If user belongs to a group which has split entries in /etc/group
search for its username in all groups.
-- Do not consider nodes explicitly powered up as DOWN with reason of "Node
unexpected rebooted".
-- Use correct slurmd spooldir when creating cpu-frequency locks.
-- Note that TICKET_BASED fairshare will be deprecated in the
future. Consider
using the FAIR_TREE algorithm instead.
-- Set job's reason to BadConstaints when job can't run on any node.
-- Prevent abort on update of reservation with no nodes (licenses only).
-- Prevent slurmctld from dumping core ifjob_resrcs is missing in the
job data structure.
-- Fix squeue to print array task ids according to man page when
SLURM_BITSTR_LEN is defined in the environment.
-- In squeue sort jobs based on array job ID if available.
-- Fix the calculation of job energy by not including the NO_VAL values.
-- Advanced reservation fixes: enable update of bluegene reservation, avoid
abort on multi-core reservations.
-- Set the totalview_stepid to the value of the job step instead of NO_VAL.
-- Fix slurmdbd core dump if the daemon does not have connection with
the database.
-- Display error message when attempting to modify priority of a held job.
-- Backfill scheduler: The configured backfill_interval value (default 30
seconds) is now interpretted as a maximum run time for the backfill
scheduler. Once reached, the scheduler will build a new job queue and
start over, even if not all jobs have been tested.
-- Backfill scheduler now considers OverTimeLimit and KillWait configuration
parameters to estimate when running jobs will exit.
-- Correct task layout with CR_Pack_Node option and more than 1 CPU per task.
-- Fix the scontrol man page describing the release argument.
-- When job QOS is modified, do so before attempting to change partition in
order to validate the partition's Allow/DenyQOS parameter.
* Changes in Slurm 15.08.0pre6
==============================
-- Add scontrol options to view and modify layouts tables.
-- Add MsgAggregationParams which controls a reverse tree to the slurmctld
which can be used to aggregate messages to the slurmctld into a single
message to reduce communication to the slurmctld. Currently only epilog
complete messages and node registration messages use this logic.
-- Add sacct and squeue options to print trackable resources.
-- Add sacctmgr option to display trackable resources.
-- If an salloc or srun command is executed on a "front-end" configuration,
that job will be assigned a slurmd shepherd daemon on the same
host as used
to execute the command when possible rather than an slurmd daemon on an
arbitrary front-end node.
-- Add srun --accel-bind option to control how tasks are bound to
GPUs and NIC
Generic RESources (GRES).
-- gres/nic plugin modified to set OMPI_MCA_btl_openib_if_include environment
variable based upon allocated devices (usable with OpenMPI and Melanox).
-- Make it so info options for srun/salloc/sbatch print with just 1
-v instead
of 4.
-- Add "no_backup_scheduling" SchedulerParameter to prevent jobs from being
scheduled when the backup takes over. Jobs can be submitted, modified and
cancelled while the backup is in control.
-- Enable native Slurm backup controller to reside on an external Cray node
when the "no_backup_scheduling" SchedulerParameter is used.
-- Removed TICKET_BASED fairshare. Consider using the FAIR_TREE algorithm.
-- Disable advanced reservation "REPLACE" option on IBM Bluegene systems.
-- Add support for control distribution of tasks across cores (in addition
to existing support for nodes and sockets, (e.g. "block", "cyclic" or
"fcyclic" task distribution at 3 levels in the hardware rather than 2).
-- Create db index on <cluster>_assoc_table.acct. Deleting accounts
that didn't
have jobs in the job table could take a long time.
-- The performance of Profiling with HDF5 is improved. In addition, internal
structures are changed to make it easier to add new profile types,
particularly energy sensors. sh5util will continue to work with either
format.
-- Add partition information to sshare output if the --partition option
is specified on the sshare command line.
-- Add sreport -T/--tres option to identify Trackable RESources (TRES) to
report.
-- Display job in sacct when single step's cpus are different from the job
allocation.
-- Add association usage information to "scontrol show cache" command output.
-- MPI/MVAPICH plugin now requires Munge for authentication.
-- job_submit/lua: Add default_qos fields. Add job record qos. Add partition
record allow_qos and qos_char fields.
--
Morris "Moe" Jette
CTO, SchedMD LLC
Commercial Slurm Development and Support