Slurm version 17.02.7 contains about 35 bug fixes developed over the past six weeks.
Slurm version 17.11.0-pre2 is the second pre-release of version 17.11, to be released in November 2017. Slurm downloads are available from http://www.schedmd.com/#repos Details about the changes in each version are listed below. * Changes in Slurm 17.02.7 ========================== -- Fix deadlock if requesting to create more than 10000 reservations. -- Fix potential memory leak when creating partition name. -- Execute the HealthCheckProgram once when the slurmd daemon starts rather than executing repeatedly until an exit code of 0 is returned. -- Set job/step start and end times to 0 when using --truncate and start > end. -- Make srun --pty option ignore EINTR allowing windows to resize. -- When resuming node only send one message to the slurmdbd. -- Modify srun --pty option to use configured SrunPortRange range. -- Fix issue with whole gres not being printed out with Slurm tools. -- Fix issue with multiple jobs from an array are prevented from starting. -- Fix for possible slurmctld abort with use of salloc/sbatch/srun --gres-flags=enforce-binding option. -- Fix race condition when using jobacct_gather/cgroup where the memory of the step wasn't always gathered correctly. -- Better debug when slurmdbd queue is filling up in the slurmctld. -- Fixed truncation on scontrol show config output. -- Serialize updates from from the dbd to the slurmctld. -- Fix memory leak in slurmctld when agent queue to the DBD has filled up. -- CRAY - Throttle step creation if trying to create too many steps at once. -- If failing after switch_g_job_init happened make sure switch_g_job_fini is called. -- Fix minor memory leak if launch fails in the slurmstepd. -- Fix issue where UnkillableStepProgram if step was in an ending state. -- Fix bug when tracking multiple simultaneous spawned ping cycles. -- jobcomp/elasticsearch plugin now saves state of pending requests on slurmctld daemon shutdown so then can be recovered on restart. -- Fix issue when an alternate munge key when communicating on a persistent connection. -- Document inconsistent behavior of GroupUpdateForce option. -- Fix bug in selection of GRES bound to specific CPUs where the GRES count is 2 or more. Previous logic could allocate CPUs not available to the job. -- Increase buffer to handle long /proc/<pid>/stat output so that Slurm can read correct RSS value and take action on jobs using more memory than requested. -- Fix srun job jobs that can run immediately to run in the highest priority partion when multiple partitions are listed. scontrol show jobs can potentially show the partition list in priority order. -- Fix starting controller if StateSaveLocation path didn't exist. -- Fix inherited association 'max' TRES limits combining multiple limits in the tree. -- Sort TRES id's on limits when getting them from the database. -- Fix issue with pmi[2|x] when TreeWidth=1. -- Correct buffer size used in determining specialized cores to avoid possible truncation of core specification and not reserving the specified cores. -- Close race condition on Slurm structures when setting DebugFlags. -- Make it so the cray/switch plugin grabs new DebugFlags on a reconfigure. -- Fix incorrect lock levels when creating or updating a reservation. -- Fix overlapping reservation resize. -- Add logic to help support Dell KNL systems where syscfg is different than the normal Intel syscfg. -- CRAY - Fix BB to handle type= correctly, regression in 17.02.6. * Changes in Slurm 17.11.0pre2 ============================== -- Initial work for heterogeneous job support (complete solution in v17.11): * Modified salloc, sbatch and srun commands to parse command line, job script and environment variables to recognize requests for heterogeneous jobs. Same commands also modified to set environment variables describing each component of the heterogeneous job. * Modified job allocate, batch job submit and job "will-run" requests to pass a list of job specifications and get a list of responses. * Modify slurmctld daemon to process a heterogeneous job request and create multiple job records as needed. * Added new fields to job record: pack_job_id, pack_job_offset and pack_job_set (set of job IDs). Added to slurmctld state save/restore logic and job information reported. * Display new job fields in "scontrol show job" output. * Modify squeue command to display heterogeneous job records using "#+#" format. The squeue --job=# output lists all components of a heterogeneous job. * Modify scancel logic to cancel all components of a heterogeneous job with a single request/RPC. * Configuration parameter DebugFlags value of "HeteroJobs" added. * Job requeue and suspend/resume modified to operate on all components of a heterogeneous job with a single request/RPC. * New web page added to describe heterogeneous jobs. * Descriptions of new API added to man pages. * Modified email notifications to only operate on the first job component. * Purge heterogeneous job records at the same time and not by individual components. * Modified logic for heterogeneous jobs submitted to multiple clusters ("--clusters=...") so the job will be routed to the cluster that is expected to start all components earliest. * Modified srun to create multiple job steps for heterogeneous job allocations. * Modified launch plugin to accept a pointer to job step options structure rather than work from a single/common data structure. -- Improve backfill scheduling algorithm with respect to starting jobs as soon as possible while avoiding advanced reservations. -- Add URG as an option to 'scancel --signal'. -- Check if the buffer returned from slurm_persist_msg_pack() isn't NULL. -- Modify all daemons to re-open log files on receipt of SIGUSR2 signal. This is much than using SIGHUP to re-read the configuration file and rebuild various tables. -- Add PrivateData=events configuration parameter -- Work for heterogeneous job support (complete solution in v17.11): * Add pointer to job option structure to job_step_create_allocation() function used by srun. * Parallelize task launch for heterogeneous job allocations (initial work). * Make packjobid, packjoboffset, and packjobidset fields available in squeue output. * Modify smap command to display heterogeneous job records using "#+#" format. * Add srun --pack-group and --mpi-combine options to control job step launch behaviour (not fully implemented). * Add pack job component ID to srun --label output (e.g. "P0 1:" for job component 0 and task 1). * jobcomp/elasticsearch: Add pack_job_id and pack_job_offset fields. * sview: Modified to display pack job information. * Major re-write of task state container logic to support for list of containers rather than one container per srun command. * Add some regression tests. * Add srun pack job environment variables when performing job allocation. -- Set Reason=dependency over Reason=JobArrayTaskLimit for pending jobs. -- Add slurm.conf configuration parameters SlurmctldSyslogDebug and SlurmdSyslogDebug to control which messages from the slurmctld and slurmd daemons get written to syslog. -- Add slurmdbd.conf configuration parameter DebugLevelSyslog to control which messages from the slurmdbd daemon get written to syslog. -- Fix handling of GroupUpdateForce option. -- Work for heterogeneous job support (complete solution in v17.11): * Add support to sched/backfill for concurrent allocation of all pack job components including support of --time-min option. * Defer initiation of a heterogeneous job until a components can be started at the same time, taking into consideration association and QOS limits for the job as a whole. * Perform limit check on heterogeneous job as a whole at submit time to reject jobs that will never be able to run. * Add pack_job_id and pack_job_offset to accounting database. * Modified sacct to accept pack job ID specification using "#+#" notation. * Modified sstat to accept pack job ID specification using "#+#" notation. -- Clear a job's "wait reason" value of BeginTime" after that time has passed. Previously a readon of "BeginTime" could be reported long after the job's requested begin time had passed. -- Split group_info in slurm_ctl_conf_t into group_force and group_time. -- Work for heterogeneous job support (complete solution in v17.11): * Fix I/O race condition on step termination for srun launching multiple pack job groups. * If prolog is running when attempting to signal a step, then return EAGAIN and retry rather than simply returning SLURM_ERROR and aborting. * Modify launch/slurm plugin to signal all components of a pack job rather than just the one (modify to use a list of step context records). * Add logic to support srun --mpi-combine option. * Set up debugger data structures. * Disable cancellation of individual component while the job is pending. * Modify scontrol job hold/release and update to operate with heterogeneous job id specification (e.g. "scontrol hold 123+4"). * If srun lacks application specification for some component, the next one specified will be used for earlier components.
