We are pleased to announce the availability of Slurm version 18.08.4.
This includes over 70 fixes since 18.08.3 was released in October. Slurm can be downloaded from https://www.schedmd.com/downloads.php . - Tim -- Tim Wickberg Chief Technology Officer, SchedMD LLC Commercial Slurm Development and Support
* Changes in Slurm 18.08.4 ========================== -- burst_buffer/cray - avoid launching a job that would be immediately cancelled due to a DataWarp failure. -- Fix message sent to user to display preempted instead of time limit when a job is preempted. -- Fix memory leak when a failure happens processing a nodes gres config. -- Improve error message when failures happen processing a nodes gres config. -- When building rpms ignore redundant standard rpaths and insecure relative rpaths, for RHEL based distros which use "check-rpaths" tool. -- Don't skip jobs in scontrol hold. -- Avoid locking the job_list when unneeded. -- Allow --cpu-bind=verbose to be used with SLURM_HINT environment variable. -- Make it so fixing runaway jobs will not alter the same job requeued when not runaway. -- Avoid checking state when searching for runaway jobs. -- Remove redundant check for end time of job when searching for runaway jobs. -- Make sure that we properly check for runawayjobs where another job might have the same id (for example, if a job was requeued) by also checking the submit time. -- Add scontrol update job ResetAccrueTime to clear a job's time previously accrued for priority. -- cons_res: Delay exiting cr_job_test until after cores/cpus are calculated and distributed. -- Fix bug where binary in cwd would trump binary in PATH with test_exec. -- Fix check to test printf("%s\n", NULL); to not require -Wno-format-truncation CFLAG. -- Fix JobAcctGatherParams=UsePss to report the correct usage. -- Fix minor memory leak in pmix plugin. -- Fix minor memory leak in slurmctld when reading configuration. -- Handle return codes correctly from pthread_* functions. -- Fix minor memory leak when a slurmd is unable to contact a slurmctld when trying to register. -- Fix sreport sizesbyaccount report when using Flatview and accounts. -- Fix incorrect shift when dealing with node weights and scheduling. -- libslurm/perl - Fix segfault caused by incorrect hv_to_slurm_ctl_conf. -- Add qos and assoc options to confirmation dialogs. -- Handle updating identical license or partition information correctly. -- Makes sure accounts and QOS' are all lower case to match documentation when read in from the slurm.conf file. -- Don't consider partitions without enough nodes in reservation, main scheduler. -- Set SLURM_NTASKS correctly if having to determine from other options. -- Removed GCP scripts from contribs. Now located at: https://github.com/SchedMD/slurm-gcp. -- Don't check existence of srun --prolog or --epilog executables when set to "none" and SLURM_TEST_EXEC is used. -- Add "P" suffix support to job and step tres specifications. -- When doing a reconfigure handle QOS' GrpJobsAccrue correctly. -- Remove unneeded extra parentheses from sh5util. -- Fix jobacct_gather/cgroup to work correctly when more than one task is started on a node. -- If requesting --ntasks-per-node with no tasks set tasks correctly. -- Accept modifiers for TRES originally added in 6f0342e0358. -- Don't remove reservation on slurmctld restart if nodes are removed from configuration. -- Fix bad xfree in task/cgroup. -- Fix removing counters if a job array isn't subject to limits and is canceled while pending. -- Make sure SLURM_NTASKS_PER_NODE is set correctly when env is overwritten by the command line. -- Clean up step on a failed node correctly. -- mpi/pmix: Fixed the logging of collective state. -- mpi/pmix: Make multi-slurmd work correctly when using ring communication. -- mpi/pmix: Fix double invocation of the PMIx lib fence callback. -- mpi/pmix: Remove unneeded libpmix callback drop in tree-based coll. -- Fix race condition in route/topology when the slurmctld is reconfigured. -- In route/topology validate the slurmctld doesn't try to initialize the node system. -- Fix issue when requesting invalid gres. -- Validate job_ptr in backfill before restoring preempt state. -- Fix issue when job's environment is minimal and only contains variables Slurm is going to replace internally. -- When handling runaway jobs remove all usage before rollup to remove any time that wasn't existent instead of just updating lines that have time with a lesser time. -- salloc - set SLURM_NTASKS_PER_CORE and SLURM_NTASKS_PER_SOCKET in the environment if the corresponding command line options are used. -- slurmd - fix handling of the -f flag to specify alternate config file locations. -- Fix scheduling logic to avoid using nodes that require a reboot for KNL node change when possible. -- Fix scheduling logic bug. There should have been a test for _not_ NODE_SET_REBOOT to continue. -- Fix a scheuling logic bug with respect to XOR operation support when there are down nodes. -- If there is a constraint construct of the form "[...&...]" then an error is generated if more than one of those specifications contains KNL NUMA or MCDRAM modes. -- Fix stepd segfault race if slurmctld hasn't registered with the launching slurmd yet delivering it's TRES list. -- Add SchedulerParameters option of bf_ignore_newly_avail_nodes to avoid scheduling lower priority jobs on resources that become available during the backfill scheduling cycle when bf_continue is enabled. -- Decrement message_connections in stepd code on error path correctly. -- Decrease an error message to be debug. -- Fix missing suffixes in squeue. -- pam_slurm_adopt - send an error message to the user if no Slurm jobs can be located on the node. -- Run SlurmctldPrimaryOffProg when the primary slurmctld process shuts down. -- job_submit/lua: Add several slurmctld return codes. -- job_submit/lua: Add user/group info to jobs. -- Fix formatting issues when printing uint64_t. -- Bump RLIMIT_NOFILE for daemons in systemd services. -- Expand %x in job name in 'scontrol show job'. -- salloc/sbatch/srun - print warning if mutually exclusive options of --mem and --mem-per-cpu are both set.