Hi,

I am maintaining the SLURM cluster of my research group. Recently I updated to Ubuntu 22.04 and Slurm 21.08.5 and ever since, I am unable to launch jobs. When launching a job, I receive the following error:

/$ srun --nodes=1 --ntasks-per-node=1 -c 1 --mem-per-cpu 1G --time=01:00:00 --pty -p amd -w cn02 --pty bash -i//
//srun: error: task 0 launch failed: Plugin initialization failed/

Strangely, I cannot find any indication of this problem in the logs (find the logs attached). The problem must be related to the task/cgroup plugin, as it does not occur when I disable it.

After reading in the documentation, I tried adding the /cgroup_enable=memory swapaccount=1/ kernel parameters, but the problem persisted.

I would be very grateful for any advice where to look since I have no idea how to investigate this issue further.

Thanks a lot in advance.

Best,

Tim

###
# Slurm cgroup support configuration file
###
CgroupAutomount=yes
CgroupMountpoint=/sys/fs/cgroup
ConstrainKmemSpace=no
ConstrainCores=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
# This will be necessary for controlling GPU access
ConstrainDevices=yes
#
# slurmd -D -vvvvvv --conf-server nas:6817
slurmd: debug:  Log file re-opened
slurmd: debug2: hwloc_topology_init
slurmd: debug2: hwloc_topology_load
slurmd: debug2: hwloc_topology_export_xml
slurmd: debug:  CPUs:16 Boards:1 Sockets:1 CoresPerSocket:16 ThreadsPerCore:1
slurmd: debug4: CPU map[0]=>0 S:C:T 0:0:0
slurmd: debug4: CPU map[1]=>1 S:C:T 0:1:0
slurmd: debug4: CPU map[2]=>2 S:C:T 0:2:0
slurmd: debug4: CPU map[3]=>3 S:C:T 0:3:0
slurmd: debug4: CPU map[4]=>4 S:C:T 0:4:0
slurmd: debug4: CPU map[5]=>5 S:C:T 0:5:0
slurmd: debug4: CPU map[6]=>6 S:C:T 0:6:0
slurmd: debug4: CPU map[7]=>7 S:C:T 0:7:0
slurmd: debug4: CPU map[8]=>8 S:C:T 0:8:0
slurmd: debug4: CPU map[9]=>9 S:C:T 0:9:0
slurmd: debug4: CPU map[10]=>10 S:C:T 0:10:0
slurmd: debug4: CPU map[11]=>11 S:C:T 0:11:0
slurmd: debug4: CPU map[12]=>12 S:C:T 0:12:0
slurmd: debug4: CPU map[13]=>13 S:C:T 0:13:0
slurmd: debug4: CPU map[14]=>14 S:C:T 0:14:0
slurmd: debug4: CPU map[15]=>15 S:C:T 0:15:0
slurmd: debug3: _set_slurmd_spooldir: initializing slurmd spool directory `/var/spool/slurmd`
slurmd: debug2: hwloc_topology_init
slurmd: debug2: xcpuinfo_hwloc_topo_load: xml file (/var/spool/slurmd/hwloc_topo_whole.xml) found
slurmd: debug:  CPUs:16 Boards:1 Sockets:1 CoresPerSocket:16 ThreadsPerCore:1
slurmd: debug4: CPU map[0]=>0 S:C:T 0:0:0
slurmd: debug4: CPU map[1]=>1 S:C:T 0:1:0
slurmd: debug4: CPU map[2]=>2 S:C:T 0:2:0
slurmd: debug4: CPU map[3]=>3 S:C:T 0:3:0
slurmd: debug4: CPU map[4]=>4 S:C:T 0:4:0
slurmd: debug4: CPU map[5]=>5 S:C:T 0:5:0
slurmd: debug4: CPU map[6]=>6 S:C:T 0:6:0
slurmd: debug4: CPU map[7]=>7 S:C:T 0:7:0
slurmd: debug4: CPU map[8]=>8 S:C:T 0:8:0
slurmd: debug4: CPU map[9]=>9 S:C:T 0:9:0
slurmd: debug4: CPU map[10]=>10 S:C:T 0:10:0
slurmd: debug4: CPU map[11]=>11 S:C:T 0:11:0
slurmd: debug4: CPU map[12]=>12 S:C:T 0:12:0
slurmd: debug4: CPU map[13]=>13 S:C:T 0:13:0
slurmd: debug4: CPU map[14]=>14 S:C:T 0:14:0
slurmd: debug4: CPU map[15]=>15 S:C:T 0:15:0
slurmd: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/gres_gpu.so
slurmd: debug:  gres/gpu: init: loaded
slurmd: debug3: Success.
slurmd: debug3: _merge_gres2: From gres.conf, using gpu:rtx2080:1:/dev/nvidia0
slurmd: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/gpu_generic.so
slurmd: debug:  gpu/generic: init: init: GPU Generic plugin loaded
slurmd: debug3: Success.
slurmd: debug3: gres_device_major : /dev/nvidia0 major 195, minor 0
slurmd: Gres Name=gpu Type=rtx2080 Count=1
slurmd: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/topology_none.so
slurmd: topology/none: init: topology NONE plugin loaded
slurmd: debug3: Success.
slurmd: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/route_default.so
slurmd: route/default: init: route default plugin loaded
slurmd: debug3: Success.
slurmd: debug2: Gathering cpu frequency information for 16 cpus
slurmd: debug:  Resource spec: No specialized cores configured by default on this node
slurmd: debug:  Resource spec: Reserved system memory limit not configured for this node
slurmd: debug3: NodeName    = cn02
slurmd: debug3: TopoAddr    = cn02
slurmd: debug3: TopoPattern = node
slurmd: debug3: ClusterName = iascluster
slurmd: debug3: Confile     = `/var/spool/slurmd/conf-cache/slurm.conf'
slurmd: debug3: Debug       = 5
slurmd: debug3: CPUs        = 16 (CF: 16, HW: 16)
slurmd: debug3: Boards      = 1  (CF:  1, HW:  1)
slurmd: debug3: Sockets     = 1  (CF:  1, HW:  1)
slurmd: debug3: Cores       = 16 (CF: 16, HW: 16)
slurmd: debug3: Threads     = 1  (CF:  1, HW:  1)
slurmd: debug3: UpTime      = 2377 = 00:39:37
slurmd: debug3: Block Map   = 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
slurmd: debug3: Inverse Map = 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
slurmd: debug3: RealMemory  = 64216
slurmd: debug3: TmpDisk     = 32108
slurmd: debug3: Epilog      = `(null)'
slurmd: debug3: Logfile     = `/var/log/slurm/slurmd.log'
slurmd: debug3: HealthCheck = `(null)'
slurmd: debug3: NodeName    = cn02
slurmd: debug3: Port        = 6818
slurmd: debug3: Prolog      = `(null)'
slurmd: debug3: TmpFS       = `/tmp'
slurmd: debug3: Public Cert = `(null)'
slurmd: debug3: Slurmstepd  = `/usr/sbin/slurmstepd'
slurmd: debug3: Spool Dir   = `/var/spool/slurmd'
slurmd: debug3: Syslog Debug  = 10
slurmd: debug3: Pid File    = `/var/run/slurmd.pid'
slurmd: debug3: Slurm UID   = 64030
slurmd: debug3: TaskProlog  = `(null)'
slurmd: debug3: TaskEpilog  = `(null)'
slurmd: debug3: TaskPluginParam = 0
slurmd: debug3: UsePAM      = 0
slurmd: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/proctrack_cgroup.so
slurmd: debug3: Success.
slurmd: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/task_cgroup.so
slurmd: debug:  task/cgroup: init: Tasks containment cgroup plugin loaded
slurmd: debug3: Success.
slurmd: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/auth_munge.so
slurmd: debug:  auth/munge: init: Munge authentication plugin loaded
slurmd: debug3: Success.
slurmd: debug:  spank: opening plugin stack /var/spool/slurmd/conf-cache/plugstack.conf
slurmd: debug:  /var/spool/slurmd/conf-cache/plugstack.conf: 1: include "/etc/slurm/plugstack.conf.d/*.conf"
slurmd: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/cred_munge.so
slurmd: cred/munge: init: Munge credential signature plugin loaded
slurmd: debug3: Success.
slurmd: debug3: slurmd initialization successful
slurmd: slurmd version 21.08.5 started
slurmd: debug3: finished daemonize
slurmd: debug3: cred_unpack: job 47 ctime:1686864959 revoked:1686864959 expires:1686865079
slurmd: debug3: not appending expired job 47 state
slurmd: debug3: destroying job 47 state
slurmd: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/jobacct_gather_linux.so
slurmd: debug:  jobacct_gather/linux: init: Job accounting gather LINUX plugin loaded
slurmd: debug3: Success.
slurmd: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/job_container_none.so
slurmd: debug:  job_container/none: init: job_container none plugin loaded
slurmd: debug3: Success.
slurmd: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/prep_script.so
slurmd: debug3: Success.
slurmd: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/core_spec_none.so
slurmd: debug3: Success.
slurmd: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/switch_cray_aries.so
slurmd: debug:  switch Cray/Aries plugin loaded.
slurmd: debug3: Success.
slurmd: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/switch_none.so
slurmd: debug:  switch/none: init: switch NONE plugin loaded
slurmd: debug3: Success.
slurmd: debug3: Successfully opened slurm listen port 6818
slurmd: slurmd started on Thu, 15 Jun 2023 23:42:39 +0200
slurmd: CPUs=16 Boards=1 Sockets=1 Cores=16 Threads=1 Memory=64216 TmpDisk=32108 Uptime=2377 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null)
slurmd: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/acct_gather_energy_none.so
slurmd: debug:  acct_gather_energy/none: init: AcctGatherEnergy NONE plugin loaded
slurmd: debug3: Success.
slurmd: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/acct_gather_profile_none.so
slurmd: debug:  acct_gather_Profile/none: init: AcctGatherProfile NONE plugin loaded
slurmd: debug3: Success.
slurmd: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/acct_gather_interconnect_none.so
slurmd: debug:  acct_gather_interconnect/none: init: AcctGatherInterconnect NONE plugin loaded
slurmd: debug3: Success.
slurmd: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/acct_gather_filesystem_none.so
slurmd: debug:  acct_gather_filesystem/none: init: AcctGatherFilesystem NONE plugin loaded
slurmd: debug3: Success.
slurmd: debug2: No acct_gather.conf file (/var/spool/slurmd/conf-cache/acct_gather.conf)
slurmd: debug:  _handle_node_reg_resp: slurmctld sent back 9 TRES.
slurmd: debug3: in the service_connection
slurmd: debug2: Start processing RPC: REQUEST_LAUNCH_TASKS
slurmd: debug2: Processing RPC: REQUEST_LAUNCH_TASKS
slurmd: launch task StepId=48.0 request from UID:1000 GID:1000 HOST:192.168.0.2 PORT:43296
slurmd: debug:  Checking credential with 444 bytes of sig data
slurmd: debug2: _insert_job_state: we already have a job state for job 48.  No big deal, just an FYI.
slurmd: debug3: _rpc_launch_tasks: call to _forkexec_slurmstepd
slurmd: debug3: slurmstepd rank 0 (cn02), parent rank -1 (NONE), children 0, depth 0, max_depth 0
slurmd: debug3: _rpc_launch_tasks: return from _forkexec_slurmstepd
slurmd: debug2: Finish processing RPC: REQUEST_LAUNCH_TASKS
slurmd: debug3: in the service_connection
slurmd: debug2: Start processing RPC: REQUEST_TERMINATE_JOB
slurmd: debug2: Processing RPC: REQUEST_TERMINATE_JOB
slurmd: debug:  _rpc_terminate_job: uid = 64030 JobId=48
slurmd: debug:  credential for job 48 revoked
slurmd: debug2: No steps in jobid 48 to send signal 18
slurmd: debug2: No steps in jobid 48 to send signal 15
slurmd: debug4: sent ALREADY_COMPLETE
slurmd: debug2: set revoke expiration for jobid 48 to 1686865488 UTS
slurmd: debug2: Finish processing RPC: REQUEST_TERMINATE_JOB
slurmctld  | slurmctld: debug:  slurmctld log levels: stderr=debug5 logfile=debug5 syslog=quiet
slurmctld  | slurmctld: debug:  Log file re-opened
slurmctld  | slurmctld: pidfile not locked, assuming no running daemon
slurmctld  | slurmctld: error: Configured MailProg is invalid
slurmctld  | slurmctld: debug:  slurmscriptd: Got ack from slurmctld, initialization successful
slurmctld  | slurmctld: debug:  _slurmscriptd_mainloop: started
slurmctld  | slurmctld: debug:  slurmctld: slurmscriptd fork()'d and initialized.
slurmctld  | slurmctld: debug4: eio: handling events for 1 objects
slurmctld  | slurmctld: debug3: Called _msg_readable
slurmctld  | slurmctld: debug:  _slurmctld_listener_thread: started listening to slurmscriptd
slurmctld  | slurmctld: debug4: eio: handling events for 1 objects
slurmctld  | slurmctld: debug3: Called _msg_readable
slurmctld  | slurmctld: slurmctld version 21.08.5 started on cluster iascluster
slurmctld  | slurmctld: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/cred_munge.so
slurmctld  | slurmctld: cred/munge: init: Munge credential signature plugin loaded
slurmctld  | slurmctld: debug3: Success.
slurmctld  | slurmctld: debug3: create_mmap_buf: loaded file `/etc/slurm//slurm.conf` as buf_t
slurmctld  | slurmctld: debug:  create_mmap_buf: Failed to open file `/etc/slurm//acct_gather.conf`, No such file or directory
slurmctld  | slurmctld: debug3: create_mmap_buf: loaded file `/etc/slurm//cgroup.conf` as buf_t
slurmctld  | slurmctld: debug:  create_mmap_buf: Failed to open file `/etc/slurm//cgroup_allowed_devices_file.conf`, No such file or directory
slurmctld  | slurmctld: debug:  create_mmap_buf: Failed to open file `/etc/slurm//ext_sensors.conf`, No such file or directory
slurmctld  | slurmctld: debug3: create_mmap_buf: loaded file `/etc/slurm//gres.conf` as buf_t
slurmctld  | slurmctld: debug:  create_mmap_buf: Failed to open file `/etc/slurm//job_container.conf`, No such file or directory
slurmctld  | slurmctld: debug:  create_mmap_buf: Failed to open file `/etc/slurm//knl_cray.conf`, No such file or directory
slurmctld  | slurmctld: debug:  create_mmap_buf: Failed to open file `/etc/slurm//knl_generic.conf`, No such file or directory
slurmctld  | slurmctld: debug3: create_mmap_buf: loaded file `/etc/slurm//plugstack.conf` as buf_t
slurmctld  | slurmctld: debug:  create_mmap_buf: Failed to open file `/etc/slurm//topology.conf`, No such file or directory
slurmctld  | slurmctld: debug3: create_mmap_buf: loaded file `/etc/slurm/slurm.conf` as buf_t
slurmctld  | slurmctld: debug3: _load_conf2list: config file slurm.conf exists
slurmctld  | slurmctld: debug:  create_mmap_buf: Failed to open file `/etc/slurm/acct_gather.conf`, No such file or directory
slurmctld  | slurmctld: debug3: _load_conf2list: config file acct_gather.conf does not exist
slurmctld  | slurmctld: debug3: create_mmap_buf: loaded file `/etc/slurm/cgroup.conf` as buf_t
slurmctld  | slurmctld: debug3: _load_conf2list: config file cgroup.conf exists
slurmctld  | slurmctld: debug:  create_mmap_buf: Failed to open file `/etc/slurm/cgroup_allowed_devices_file.conf`, No such file or directory
slurmctld  | slurmctld: debug3: _load_conf2list: config file cgroup_allowed_devices_file.conf does not exist
slurmctld  | slurmctld: debug:  create_mmap_buf: Failed to open file `/etc/slurm/cli_filter.lua`, No such file or directory
slurmctld  | slurmctld: debug3: _load_conf2list: config file cli_filter.lua does not exist
slurmctld  | slurmctld: debug:  create_mmap_buf: Failed to open file `/etc/slurm/ext_sensors.conf`, No such file or directory
slurmctld  | slurmctld: debug3: _load_conf2list: config file ext_sensors.conf does not exist
slurmctld  | slurmctld: debug3: create_mmap_buf: loaded file `/etc/slurm/gres.conf` as buf_t
slurmctld  | slurmctld: debug3: _load_conf2list: config file gres.conf exists
slurmctld  | slurmctld: debug:  create_mmap_buf: Failed to open file `/etc/slurm/helpers.conf`, No such file or directory
slurmctld  | slurmctld: debug3: _load_conf2list: config file helpers.conf does not exist
slurmctld  | slurmctld: debug:  create_mmap_buf: Failed to open file `/etc/slurm/job_container.conf`, No such file or directory
slurmctld  | slurmctld: debug3: _load_conf2list: config file job_container.conf does not exist
slurmctld  | slurmctld: debug:  create_mmap_buf: Failed to open file `/etc/slurm/knl_cray.conf`, No such file or directory
slurmctld  | slurmctld: debug3: _load_conf2list: config file knl_cray.conf does not exist
slurmctld  | slurmctld: debug:  create_mmap_buf: Failed to open file `/etc/slurm/oci.conf`, No such file or directory
slurmctld  | slurmctld: debug3: _load_conf2list: config file oci.conf does not exist
slurmctld  | slurmctld: debug3: create_mmap_buf: loaded file `/etc/slurm/plugstack.conf` as buf_t
slurmctld  | slurmctld: debug3: _load_conf2list: config file plugstack.conf exists
slurmctld  | slurmctld: debug:  create_mmap_buf: Failed to open file `/etc/slurm/topology.conf`, No such file or directory
slurmctld  | slurmctld: debug3: _load_conf2list: config file topology.conf does not exist
slurmctld  | slurmctld: debug3: create_mmap_buf: loaded file `/etc/slurm/slurm.conf` as buf_t
slurmctld  | slurmctld: debug3: _load_conf2list: config file slurm.conf exists
slurmctld  | slurmctld: debug:  create_mmap_buf: Failed to open file `/etc/slurm/cli_filter.lua`, No such file or directory
slurmctld  | slurmctld: debug3: _load_conf2list: config file cli_filter.lua does not exist
slurmctld  | slurmctld: debug3: create_mmap_buf: loaded file `/etc/slurm/plugstack.conf` as buf_t
slurmctld  | slurmctld: debug3: _load_conf2list: config file plugstack.conf exists
slurmctld  | slurmctld: debug:  create_mmap_buf: Failed to open file `/etc/slurm/topology.conf`, No such file or directory
slurmctld  | slurmctld: debug3: _load_conf2list: config file topology.conf does not exist
slurmctld  | slurmctld: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/auth_munge.so
slurmctld  | slurmctld: debug:  auth/munge: init: Munge authentication plugin loaded
slurmctld  | slurmctld: debug3: Success.
slurmctld  | slurmctld: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/select_cons_tres.so
slurmctld  | slurmctld: select/cons_tres: common_init: select/cons_tres loaded
slurmctld  | slurmctld: debug3: Success.
slurmctld  | slurmctld: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/select_linear.so
slurmctld  | slurmctld: select/linear: init: Linear node selection plugin loaded with argument 17
slurmctld  | slurmctld: debug3: Success.
slurmctld  | slurmctld: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/select_cray_aries.so
slurmctld  | slurmctld: select/cray_aries: init: Cray/Aries node selection plugin loaded
slurmctld  | slurmctld: debug3: Success.
slurmctld  | slurmctld: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/select_cons_res.so
slurmctld  | slurmctld: select/cons_res: common_init: select/cons_res loaded
slurmctld  | slurmctld: debug3: Success.
slurmctld  | slurmctld: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/preempt_none.so
slurmctld  | slurmctld: preempt/none: init: preempt/none loaded
slurmctld  | slurmctld: debug3: Success.
slurmctld  | slurmctld: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/acct_gather_energy_none.so
slurmctld  | slurmctld: debug:  acct_gather_energy/none: init: AcctGatherEnergy NONE plugin loaded
slurmctld  | slurmctld: debug3: Success.
slurmctld  | slurmctld: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/acct_gather_profile_none.so
slurmctld  | slurmctld: debug:  acct_gather_Profile/none: init: AcctGatherProfile NONE plugin loaded
slurmctld  | slurmctld: debug3: Success.
slurmctld  | slurmctld: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/acct_gather_interconnect_none.so
slurmctld  | slurmctld: debug:  acct_gather_interconnect/none: init: AcctGatherInterconnect NONE plugin loaded
slurmctld  | slurmctld: debug3: Success.
slurmctld  | slurmctld: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/acct_gather_filesystem_none.so
slurmctld  | slurmctld: debug:  acct_gather_filesystem/none: init: AcctGatherFilesystem NONE plugin loaded
slurmctld  | slurmctld: debug3: Success.
slurmctld  | slurmctld: debug2: No acct_gather.conf file (/etc/slurm/acct_gather.conf)
slurmctld  | slurmctld: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/jobacct_gather_linux.so
slurmctld  | slurmctld: debug:  jobacct_gather/linux: init: Job accounting gather LINUX plugin loaded
slurmctld  | slurmctld: debug3: Success.
slurmctld  | slurmctld: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/prep_script.so
slurmctld  | slurmctld: debug3: Success.
slurmctld  | slurmctld: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/ext_sensors_none.so
slurmctld  | slurmctld: ext_sensors/none: init: ExtSensors NONE plugin loaded
slurmctld  | slurmctld: debug3: Success.
slurmctld  | slurmctld: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/accounting_storage_slurmdbd.so
slurmctld  | slurmctld: accounting_storage/slurmdbd: init: Accounting storage SLURMDBD plugin loaded
slurmctld  | slurmctld: debug3: Success.
slurmctld  | slurmctld: debug:  accounting_storage/slurmdbd: _connect_dbd_conn: Sent PersistInit msg
slurmctld  | slurmctld: debug4: accounting_storage/slurmdbd: _load_dbd_state: There is no state save file to open by name /var/spool/slurmctld/dbd.messages
slurmctld  | slurmctld: accounting_storage/slurmdbd: clusteracct_storage_p_register_ctld: Registering slurmctld at port 6817 with slurmdbd
slurmctld  | slurmctld: debug2: assoc 98(younes, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 99(younes, younes) has direct parent of 98(younes, (null))
slurmctld  | slurmctld: debug2: user younes default acct is younes
slurmctld  | slurmctld: debug2: assoc 96(watson, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 97(watson, watson) has direct parent of 96(watson, (null))
slurmctld  | slurmctld: debug2: user watson default acct is watson
slurmctld  | slurmctld: debug2: assoc 94(vincent, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 95(vincent, vincent) has direct parent of 94(vincent, (null))
slurmctld  | slurmctld: debug2: user vincent default acct is vincent
slurmctld  | slurmctld: debug2: assoc 92(urain, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 93(urain, urain) has direct parent of 92(urain, (null))
slurmctld  | slurmctld: debug2: user urain default acct is urain
slurmctld  | slurmctld: debug2: assoc 90(toelle, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 91(toelle, toelle) has direct parent of 90(toelle, (null))
slurmctld  | slurmctld: debug2: user toelle default acct is toelle
slurmctld  | slurmctld: debug2: assoc 88(tiboni, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 89(tiboni, tiboni) has direct parent of 88(tiboni, (null))
slurmctld  | slurmctld: debug2: user tiboni default acct is tiboni
slurmctld  | slurmctld: debug2: assoc 86(tateo, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 87(tateo, tateo) has direct parent of 86(tateo, (null))
slurmctld  | slurmctld: debug2: user tateo default acct is tateo
slurmctld  | slurmctld: debug2: assoc 84(stud_zoller, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 85(stud_zoller, stud_zoller) has direct parent of 84(stud_zoller, (null))
slurmctld  | slurmctld: debug2: user stud_zoller default acct is stud_zoller
slurmctld  | slurmctld: debug2: assoc 82(stud_zach, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 83(stud_zach, stud_zach) has direct parent of 82(stud_zach, (null))
slurmctld  | slurmctld: debug2: user stud_zach default acct is stud_zach
slurmctld  | slurmctld: debug2: assoc 80(stud_mueller, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 81(stud_mueller, stud_mueller) has direct parent of 80(stud_mueller, (null))
slurmctld  | slurmctld: debug2: user stud_mueller default acct is stud_mueller
slurmctld  | slurmctld: debug2: assoc 78(stud_mittenbuehler, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 79(stud_mittenbuehler, stud_mittenbuehler) has direct parent of 78(stud_mittenbuehler, (null))
slurmctld  | slurmctld: debug2: user stud_mittenbuehler default acct is stud_mittenbuehler
slurmctld  | slurmctld: debug2: assoc 76(stud_lin, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 77(stud_lin, stud_lin) has direct parent of 76(stud_lin, (null))
slurmctld  | slurmctld: debug2: user stud_lin default acct is stud_lin
slurmctld  | slurmctld: debug2: assoc 74(stud_krohn, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 75(stud_krohn, stud_krohn) has direct parent of 74(stud_krohn, (null))
slurmctld  | slurmctld: debug2: user stud_krohn default acct is stud_krohn
slurmctld  | slurmctld: debug2: assoc 72(stud_kramer, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 73(stud_kramer, stud_kramer) has direct parent of 72(stud_kramer, (null))
slurmctld  | slurmctld: debug2: user stud_kramer default acct is stud_kramer
slurmctld  | slurmctld: debug2: assoc 70(stud_kappes, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 71(stud_kappes, stud_kappes) has direct parent of 70(stud_kappes, (null))
slurmctld  | slurmctld: debug2: user stud_kappes default acct is stud_kappes
slurmctld  | slurmctld: debug2: assoc 68(stud_hu, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 69(stud_hu, stud_hu) has direct parent of 68(stud_hu, (null))
slurmctld  | slurmctld: debug2: user stud_hu default acct is stud_hu
slurmctld  | slurmctld: debug2: assoc 66(stud_heeg, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 67(stud_heeg, stud_heeg) has direct parent of 66(stud_heeg, (null))
slurmctld  | slurmctld: debug2: user stud_heeg default acct is stud_heeg
slurmctld  | slurmctld: debug2: assoc 64(stud_hammacher, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 65(stud_hammacher, stud_hammacher) has direct parent of 64(stud_hammacher, (null))
slurmctld  | slurmctld: debug2: user stud_hammacher default acct is stud_hammacher
slurmctld  | slurmctld: debug2: assoc 62(stud_gomez, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 63(stud_gomez, stud_gomez) has direct parent of 62(stud_gomez, (null))
slurmctld  | slurmctld: debug2: user stud_gomez default acct is stud_gomez
slurmctld  | slurmctld: debug2: assoc 60(stud_gasche, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 61(stud_gasche, stud_gasche) has direct parent of 60(stud_gasche, (null))
slurmctld  | slurmctld: debug2: user stud_gasche default acct is stud_gasche
slurmctld  | slurmctld: debug2: assoc 58(stud_gao1, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 59(stud_gao1, stud_gao1) has direct parent of 58(stud_gao1, (null))
slurmctld  | slurmctld: debug2: user stud_gao1 default acct is stud_gao1
slurmctld  | slurmctld: debug2: assoc 56(stud_gao, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 57(stud_gao, stud_gao) has direct parent of 56(stud_gao, (null))
slurmctld  | slurmctld: debug2: user stud_gao default acct is stud_gao
slurmctld  | slurmctld: debug2: assoc 54(stud_doiz, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 55(stud_doiz, stud_doiz) has direct parent of 54(stud_doiz, (null))
slurmctld  | slurmctld: debug2: user stud_doiz default acct is stud_doiz
slurmctld  | slurmctld: debug2: assoc 52(stud_chemangui, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 53(stud_chemangui, stud_chemangui) has direct parent of 52(stud_chemangui, (null))
slurmctld  | slurmctld: debug2: user stud_chemangui default acct is stud_chemangui
slurmctld  | slurmctld: debug2: assoc 50(schneider, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 51(schneider, schneider) has direct parent of 50(schneider, (null))
slurmctld  | slurmctld: debug2: user schneider default acct is schneider
slurmctld  | slurmctld: debug2: assoc 48(reddi, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 49(reddi, reddi) has direct parent of 48(reddi, (null))
slurmctld  | slurmctld: debug2: user reddi default acct is reddi
slurmctld  | slurmctld: debug2: assoc 46(palenicek, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 47(palenicek, palenicek) has direct parent of 46(palenicek, (null))
slurmctld  | slurmctld: debug2: user palenicek default acct is palenicek
slurmctld  | slurmctld: debug2: assoc 44(liu, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 45(liu, liu) has direct parent of 44(liu, (null))
slurmctld  | slurmctld: debug2: user liu default acct is liu
slurmctld  | slurmctld: debug2: assoc 42(le, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 43(le, le) has direct parent of 42(le, (null))
slurmctld  | slurmctld: debug2: user le default acct is le
slurmctld  | slurmctld: debug2: assoc 40(kshirsagar, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 41(kshirsagar, kshirsagar) has direct parent of 40(kshirsagar, (null))
slurmctld  | slurmctld: debug2: user kshirsagar default acct is kshirsagar
slurmctld  | slurmctld: debug2: assoc 38(koert, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 39(koert, koert) has direct parent of 38(koert, (null))
slurmctld  | slurmctld: debug2: user koert default acct is koert
slurmctld  | slurmctld: debug2: assoc 36(klink, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 37(klink, klink) has direct parent of 36(klink, (null))
slurmctld  | slurmctld: debug2: user klink default acct is klink
slurmctld  | slurmctld: debug2: assoc 34(jauhri, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 35(jauhri, jauhri) has direct parent of 34(jauhri, (null))
slurmctld  | slurmctld: debug2: user jauhri default acct is jauhri
slurmctld  | slurmctld: debug2: assoc 32(jansonnie, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 33(jansonnie, jansonnie) has direct parent of 32(jansonnie, (null))
slurmctld  | slurmctld: debug2: user jansonnie default acct is jansonnie
slurmctld  | slurmctld: debug2: assoc 30(huang, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 31(huang, huang) has direct parent of 30(huang, (null))
slurmctld  | slurmctld: debug2: user huang default acct is huang
slurmctld  | slurmctld: debug2: assoc 28(hendawy, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 29(hendawy, hendawy) has direct parent of 28(hendawy, (null))
slurmctld  | slurmctld: debug2: user hendawy default acct is hendawy
slurmctld  | slurmctld: debug2: assoc 26(hansel, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 27(hansel, hansel) has direct parent of 26(hansel, (null))
slurmctld  | slurmctld: debug2: user hansel default acct is hansel
slurmctld  | slurmctld: debug2: assoc 24(gruner, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 25(gruner, gruner) has direct parent of 24(gruner, (null))
slurmctld  | slurmctld: debug2: user gruner default acct is gruner
slurmctld  | slurmctld: debug2: assoc 22(funk, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 23(funk, funk) has direct parent of 22(funk, (null))
slurmctld  | slurmctld: debug2: user funk default acct is funk
slurmctld  | slurmctld: debug2: assoc 20(deramo, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 21(deramo, deramo) has direct parent of 20(deramo, (null))
slurmctld  | slurmctld: debug2: user deramo default acct is deramo
slurmctld  | slurmctld: debug2: assoc 18(dam, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 19(dam, dam) has direct parent of 18(dam, (null))
slurmctld  | slurmctld: debug2: user dam default acct is dam
slurmctld  | slurmctld: debug2: assoc 16(chalvatzaki, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 17(chalvatzaki, chalvatzaki) has direct parent of 16(chalvatzaki, (null))
slurmctld  | slurmctld: debug2: user chalvatzaki default acct is chalvatzaki
slurmctld  | slurmctld: debug2: assoc 14(carvalho, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 15(carvalho, carvalho) has direct parent of 14(carvalho, (null))
slurmctld  | slurmctld: debug2: user carvalho default acct is carvalho
slurmctld  | slurmctld: debug2: assoc 12(boehm, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 13(boehm, boehm) has direct parent of 12(boehm, (null))
slurmctld  | slurmctld: debug2: user boehm default acct is boehm
slurmctld  | slurmctld: debug2: assoc 10(belousov, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 11(belousov, belousov) has direct parent of 10(belousov, (null))
slurmctld  | slurmctld: debug2: user belousov default acct is belousov
slurmctld  | slurmctld: debug2: assoc 8(bang, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 9(bang, bang) has direct parent of 8(bang, (null))
slurmctld  | slurmctld: debug2: user bang default acct is bang
slurmctld  | slurmctld: debug2: assoc 6(arenz, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 7(arenz, arenz) has direct parent of 6(arenz, (null))
slurmctld  | slurmctld: debug2: user arenz default acct is arenz
slurmctld  | slurmctld: debug2: assoc 4(alhafez, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 5(alhafez, alhafez) has direct parent of 4(alhafez, (null))
slurmctld  | slurmctld: debug2: user alhafez default acct is alhafez
slurmctld  | slurmctld: debug2: assoc 3(ias, (null)) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: assoc 100(ias, ias) has direct parent of 3(ias, (null))
slurmctld  | slurmctld: debug2: user ias default acct is ias
slurmctld  | slurmctld: debug2: assoc 2(root, root) has direct parent of 1(root, (null))
slurmctld  | slurmctld: debug2: user root default acct is root
REDACTED: removed usernames
slurmctld  | slurmctld: debug3: assoc 3(ias (null)) normalize = 0.020000 from 3(ias (null)) 1 / 50 = 0.020000
slurmctld  | slurmctld: debug3: assoc 100(ias ias) normalize = 1.000000 from 100(ias ias) 1 / 1 = 1.000000
slurmctld  | slurmctld: debug3: assoc 100(ias ias) normalize = 0.020000 from 3(ias (null)) 1 / 50 = 0.020000
slurmctld  | slurmctld: debug3: assoc 2(root root) normalize = 0.020000 from 2(root root) 1 / 50 = 0.020000
slurmctld  | slurmctld: debug3: create_mmap_buf: loaded file `/var/spool/slurmctld/assoc_usage` as buf_t
slurmctld  | slurmctld: debug3: Version in assoc_usage header is 9472
slurmctld  | slurmctld: debug3: create_mmap_buf: loaded file `/var/spool/slurmctld/qos_usage` as buf_t
slurmctld  | slurmctld: debug3: Version in qos_usage header is 9472
slurmctld  | slurmctld: debug3: found correct tres
slurmctld  | slurmctld: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/switch_cray_aries.so
slurmctld  | slurmctld: debug:  switch Cray/Aries plugin loaded.
slurmctld  | slurmctld: debug3: Success.
slurmctld  | slurmctld: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/switch_none.so
slurmctld  | slurmctld: debug:  switch/none: init: switch NONE plugin loaded
slurmctld  | slurmctld: debug3: Success.
slurmctld  | slurmctld: debug:  Reading slurm.conf file: /etc/slurm/slurm.conf
slurmctld  | slurmctld: error: Ignoring obsolete FastSchedule=1 option. Please remove from your configuration.
slurmctld  | slurmctld: debug:  Reading cgroup.conf file /etc/slurm/cgroup.conf
slurmctld  | slurmctld: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/topology_none.so
slurmctld  | slurmctld: topology/none: init: topology NONE plugin loaded
slurmctld  | slurmctld: debug3: Success.
slurmctld  | slurmctld: debug:  No DownNodes
slurmctld  | slurmctld: debug3: create_mmap_buf: loaded file `/var/spool/slurmctld/last_config_lite` as buf_t
slurmctld  | slurmctld: debug3: Version in last_conf_lite header is 9472
slurmctld  | slurmctld: debug:  slurmctld log levels: stderr=debug5 logfile=debug5 syslog=quiet
slurmctld  | slurmctld: debug:  Log file re-opened
slurmctld  | slurmctld: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/jobcomp_none.so
slurmctld  | slurmctld: debug3: Success.
slurmctld  | slurmctld: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/sched_backfill.so
slurmctld  | slurmctld: sched: Backfill scheduler plugin loaded
slurmctld  | slurmctld: debug3: Success.
slurmctld  | slurmctld: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/route_default.so
slurmctld  | slurmctld: route/default: init: route default plugin loaded
slurmctld  | slurmctld: debug3: Success.
slurmctld  | slurmctld: error: get_addr_info: getaddrinfo() failed: Temporary failure in name resolution: Resource temporarily unavailable, attempt number 1
slurmctld  | slurmctld: error: get_addr_info: getaddrinfo() failed: Temporary failure in name resolution: Resource temporarily unavailable, attempt number 2
slurmctld  | slurmctld: error: get_addr_info: getaddrinfo() failed: Temporary failure in name resolution: Resource temporarily unavailable, attempt number 3
slurmctld  | slurmctld: error: get_addr_info: getaddrinfo() failed: Temporary failure in name resolution: Resource temporarily unavailable, attempt number 4
slurmctld  | slurmctld: error: get_addr_info: getaddrinfo() failed: Temporary failure in name resolution: Resource temporarily unavailable, attempt number 5
slurmctld  | slurmctld: error: get_addr_info: getaddrinfo() fAiled: Temporary failure in name resolution
slurmctld  | slurmctld: error: slurm_set_addr: Unable to resolve "dgx-station"
slurmctld  | slurmctld: error: slurm_get_port: Address family '0' not supported
slurmctld  | slurmctld: error: _set_slurmd_addr: failure on dgx-station
slurmctld  | slurmctld: Warning: Note very large processing time from _set_slurmd_addr: usec=5011392 began=21:42:28.875
slurmctld  | slurmctld: debug3: create_mmap_buf: loaded file `/var/spool/slurmctld/node_state` as buf_t
slurmctld  | slurmctld: debug3: Version string in node_state header is PROTOCOL_VERSION
slurmctld  | slurmctld: Recovered state of 21 nodes
slurmctld  | slurmctld: Down nodes: cn07
slurmctld  | slurmctld: debug3: create_mmap_buf: loaded file `/var/spool/slurmctld/job_state` as buf_t
slurmctld  | slurmctld: debug3: Version string in job_state header is PROTOCOL_VERSION
slurmctld  | slurmctld: debug3: Job id in job_state header is 48
slurmctld  | slurmctld: debug5: assoc_mgr_fill_in_assoc: looking for assoc of user=(null)(0), acct=(null), cluster=(null), partition=(null)
slurmctld  | slurmctld: debug3: assoc_mgr_fill_in_assoc: found correct association of user=(null)(0), acct=(null), cluster=(null), partition=(null) to assoc=100 acct=ias
slurmctld  | slurmctld: Recovered JobId=47 Assoc=100
slurmctld  | slurmctld: debug3: Set job_id_sequence to 48
slurmctld  | slurmctld: Recovered information about 1 jobs
slurmctld  | slurmctld: select/cons_res: part_data_create_array: select/cons_res: preparing for 7 partitions
slurmctld  | slurmctld: debug3: TRES Weight: cpu = 16.000000 * 1.000000 = 16.000000
slurmctld  | slurmctld: debug3: TRES Weight: mem = 64000.000000 * 0.000015 = 0.976562
slurmctld  | slurmctld: debug3: TRES Weight: energy = 0.000000 * 0.000000 = 0.000000
slurmctld  | slurmctld: debug3: TRES Weight: node = 0.000000 * 0.000000 = 0.000000
slurmctld  | slurmctld: debug3: TRES Weight: fs/disk = 0.000000 * 0.000000 = 0.000000
slurmctld  | slurmctld: debug3: TRES Weight: vmem = 0.000000 * 0.000000 = 0.000000
slurmctld  | slurmctld: debug3: TRES Weight: pages = 0.000000 * 0.000000 = 0.000000
slurmctld  | slurmctld: debug3: TRES Weight: gres/gpu = 1.000000 * 10000000000.000000 = 10000000000.000000
slurmctld  | slurmctld: debug3: TRES Weighted: SUM(TRES) = 10000000016.976562
REDACTED: loads of TRES Weight logs
slurmctld  | slurmctld: debug:  Updating partition uid access list
slurmctld  | slurmctld: debug2: load_part_uid_allow_list: list updated, resetting last_part_update time
slurmctld  | slurmctld: debug3: create_mmap_buf: loaded file `/var/spool/slurmctld/resv_state` as buf_t
slurmctld  | slurmctld: debug3: Version string in resv_state header is PROTOCOL_VERSION
slurmctld  | slurmctld: Recovered state of 0 reservations
slurmctld  | slurmctld: debug3: create_mmap_buf: loaded file `/var/spool/slurmctld/trigger_state` as buf_t
slurmctld  | slurmctld: State of 0 triggers recovered
slurmctld  | slurmctld: read_slurm_conf: backup_controller not specified
slurmctld  | slurmctld: select/cons_res: select_p_reconfigure: select/cons_res: reconfigure
slurmctld  | slurmctld: select/cons_res: part_data_create_array: select/cons_res: preparing for 7 partitions
slurmctld  | slurmctld: Warning: Note very large processing time from read_slurm_conf: usec=5089477 began=21:42:28.866
slurmctld  | slurmctld: Running as primary controller
slurmctld  | slurmctld: debug:  No backup controllers, not launching heartbeat.
slurmctld  | slurmctld: debug2: accounting_storage/slurmdbd: clusteracct_storage_p_cluster_tres: Sending tres '1=580,2=4146000,3=0,4=21,5=610000000630,6=0,7=0,8=0,1001=66' for cluster
slurmctld  | slurmctld: debug:  Note large processing time from slurmdbd agent: full loop: usec=1725036 began=21:42:33.890
slurmctld  | slurmctld: debug3: create_mmap_buf: loaded file `/var/spool/slurmctld/fed_mgr_state` as buf_t
slurmctld  | slurmctld: debug3: Version in fed_mgr_state header is 9472
slurmctld  | slurmctld: debug:  No feds to retrieve from state
slurmctld  | slurmctld: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/priority_multifactor.so
slurmctld  | slurmctld: debug3: create_mmap_buf: loaded file `/var/spool/slurmctld/priority_last_decay_ran` as buf_t
slurmctld  | slurmctld: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/site_factor_none.so
slurmctld  | slurmctld: debug:  site_factor/none: init: init: NULL site_factor plugin loaded
slurmctld  | slurmctld: debug3: Success.
slurmctld  | slurmctld: debug2: site_factor_plugin_init: plugin site_factor/none loaded
slurmctld  | slurmctld: debug:  priority/multifactor: init: Priority MULTIFACTOR plugin loaded
slurmctld  | slurmctld: debug3: Success.
slurmctld  | slurmctld: No parameter for mcs plugin, default values set
slurmctld  | slurmctld: mcs: MCSParameters = (null). ondemand set.
slurmctld  | slurmctld: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/mcs_none.so
slurmctld  | slurmctld: debug:  mcs/none: init: mcs none plugin loaded
slurmctld  | slurmctld: debug3: Success.
slurmctld  | slurmctld: debug:  power_save module disabled, SuspendTime < 0
slurmctld  | slurmctld: debug3: _slurmctld_rpc_mgr pid = 29
slurmctld  | slurmctld: debug:  power_save mode not enabled
slurmctld  | slurmctld: debug2: slurmctld listening on 0.0.0.0:6817
slurmctld  | slurmctld: debug3: _slurmctld_background pid = 29
slurmctld  | slurmctld: debug4: priority/multifactor: _write_last_decay_ran: done writing time 1686865355
slurmctld  | slurmctld: debug2: Processing RPC: REQUEST_CONFIG from UID=0
slurmctld  | slurmctld: debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from UID=0
slurmctld  | slurmctld: debug3: validate_node_specs: validating nodes cn02 in state: UNKNOWN
slurmctld  | slurmctld: debug3: Trying to load plugin /usr/lib/x86_64-linux-gnu/slurm-wlm/gres_gpu.so
slurmctld  | slurmctld: debug:  gres/gpu: init: loaded
slurmctld  | slurmctld: debug3: Success.
slurmctld  | slurmctld: debug:  validate_node_specs: node cn02 registered with 0 jobs
slurmctld  | slurmctld: debug2: _slurm_rpc_node_registration complete for cn02 usec=225
slurmctld  | slurmctld: debug:  Spawning registration agent for cn[01,03-14],gn[01-06] 19 hosts
slurmctld  | slurmctld: SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max_job_start=0,sched_min_interval=2
slurmctld  | slurmctld: debug:  sched: Running job scheduler for default depth.
slurmctld  | slurmctld: debug2: Spawning RPC agent for msg_type REQUEST_NODE_REGISTRATION_STATUS
slurmctld  | slurmctld: debug2: Tree head got back 0 looking for 19
slurmctld  | slurmctld: debug3: Tree sending to cn01
slurmctld  | slurmctld: debug3: Tree sending to cn04
slurmctld  | slurmctld: debug3: Tree sending to cn03
slurmctld  | slurmctld: debug3: Tree sending to cn05
slurmctld  | slurmctld: debug3: Tree sending to cn06
slurmctld  | slurmctld: debug3: Tree sending to cn07
slurmctld  | slurmctld: debug2: Tree head got back 1
slurmctld  | slurmctld: debug3: Tree sending to cn08
slurmctld  | slurmctld: debug3: Tree sending to cn09
slurmctld  | slurmctld: debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from UID=0
slurmctld  | slurmctld: debug3: validate_node_specs: validating nodes cn03 in state: UNKNOWN
slurmctld  | slurmctld: debug:  validate_node_specs: node cn03 registered with 0 jobs
slurmctld  | slurmctld: debug2: _slurm_rpc_node_registration complete for cn03 usec=51
slurmctld  | slurmctld: debug2: Tree head got back 2
slurmctld  | slurmctld: debug3: Tree sending to cn10
slurmctld  | slurmctld: debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from UID=0
slurmctld  | slurmctld: debug3: validate_node_specs: validating nodes cn04 in state: UNKNOWN
slurmctld  | slurmctld: debug:  validate_node_specs: node cn04 registered with 0 jobs
slurmctld  | slurmctld: debug2: _slurm_rpc_node_registration complete for cn04 usec=29
slurmctld  | slurmctld: debug3: Tree sending to cn11
slurmctld  | slurmctld: debug2: Tree head got back 3
slurmctld  | slurmctld: debug3: Tree sending to cn12
slurmctld  | slurmctld: debug3: Tree sending to cn13
slurmctld  | slurmctld: debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from UID=0
slurmctld  | slurmctld: debug3: validate_node_specs: validating nodes cn01 in state: UNKNOWN
slurmctld  | slurmctld: debug:  validate_node_specs: node cn01 registered with 0 jobs
slurmctld  | slurmctld: debug2: _slurm_rpc_node_registration complete for cn01 usec=32
slurmctld  | slurmctld: debug3: Tree sending to cn14
slurmctld  | slurmctld: debug3: Tree sending to gn01
slurmctld  | slurmctld: debug2: Tree head got back 4
slurmctld  | slurmctld: debug3: Tree sending to gn02
slurmctld  | slurmctld: debug2: Tree head got back 5
slurmctld  | slurmctld: debug2: Tree head got back 6
slurmctld  | slurmctld: debug3: Tree sending to gn03
slurmctld  | slurmctld: debug2: Tree head got back 7
slurmctld  | slurmctld: debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from UID=0
slurmctld  | slurmctld: debug3: validate_node_specs: validating nodes cn08 in state: UNKNOWN
slurmctld  | slurmctld: debug:  validate_node_specs: node cn08 registered with 0 jobs
slurmctld  | slurmctld: debug2: _slurm_rpc_node_registration complete for cn08 usec=46
slurmctld  | slurmctld: debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from UID=0
slurmctld  | slurmctld: debug3: validate_node_specs: validating nodes cn05 in state: UNKNOWN
slurmctld  | slurmctld: debug:  validate_node_specs: node cn05 registered with 0 jobs
slurmctld  | slurmctld: debug2: _slurm_rpc_node_registration complete for cn05 usec=57
slurmctld  | slurmctld: debug3: Tree sending to gn04
slurmctld  | slurmctld: debug2: Tree head got back 8
slurmctld  | slurmctld: debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from UID=0
slurmctld  | slurmctld: debug3: validate_node_specs: validating nodes cn09 in state: UNKNOWN
slurmctld  | slurmctld: debug:  validate_node_specs: node cn09 registered with 0 jobs
slurmctld  | slurmctld: debug2: _slurm_rpc_node_registration complete for cn09 usec=33
slurmctld  | slurmctld: debug2: Tree head got back 9
slurmctld  | slurmctld: debug3: Tree sending to gn05
slurmctld  | slurmctld: debug3: Tree sending to gn06
slurmctld  | slurmctld: debug2: Tree head got back 10
slurmctld  | slurmctld: debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from UID=0
slurmctld  | slurmctld: debug3: validate_node_specs: validating nodes cn10 in state: UNKNOWN
slurmctld  | slurmctld: debug:  validate_node_specs: node cn10 registered with 0 jobs
slurmctld  | slurmctld: debug2: _slurm_rpc_node_registration complete for cn10 usec=46
slurmctld  | slurmctld: debug2: Tree head got back 11
slurmctld  | slurmctld: debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from UID=0
slurmctld  | slurmctld: debug3: validate_node_specs: validating nodes cn11 in state: UNKNOWN
slurmctld  | slurmctld: debug:  validate_node_specs: node cn11 registered with 0 jobs
slurmctld  | slurmctld: debug2: _slurm_rpc_node_registration complete for cn11 usec=36
slurmctld  | slurmctld: debug2: Tree head got back 12
slurmctld  | slurmctld: debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from UID=0
slurmctld  | slurmctld: debug3: validate_node_specs: validating nodes cn12 in state: UNKNOWN
slurmctld  | slurmctld: debug2: Tree head got back 13
slurmctld  | slurmctld: debug:  validate_node_specs: node cn12 registered with 0 jobs
slurmctld  | slurmctld: debug2: _slurm_rpc_node_registration complete for cn12 usec=58
slurmctld  | slurmctld: debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from UID=0
slurmctld  | slurmctld: debug3: validate_node_specs: validating nodes cn06 in state: UNKNOWN
slurmctld  | slurmctld: debug:  validate_node_specs: node cn06 registered with 0 jobs
slurmctld  | slurmctld: debug2: _slurm_rpc_node_registration complete for cn06 usec=46
slurmctld  | slurmctld: debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from UID=0
slurmctld  | slurmctld: debug3: validate_node_specs: validating nodes cn14 in state: UNKNOWN
slurmctld  | slurmctld: debug:  validate_node_specs: node cn14 registered with 0 jobs
slurmctld  | slurmctld: debug2: _slurm_rpc_node_registration complete for cn14 usec=37
slurmctld  | slurmctld: debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from UID=0
slurmctld  | slurmctld: debug3: validate_node_specs: validating nodes gn01 in state: UNKNOWN
slurmctld  | slurmctld: debug:  validate_node_specs: node gn01 registered with 0 jobs
slurmctld  | slurmctld: debug2: _slurm_rpc_node_registration complete for gn01 usec=29
slurmctld  | slurmctld: debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from UID=0
slurmctld  | slurmctld: debug3: validate_node_specs: validating nodes cn13 in state: UNKNOWN
slurmctld  | slurmctld: debug:  validate_node_specs: node cn13 registered with 0 jobs
slurmctld  | slurmctld: debug2: _slurm_rpc_node_registration complete for cn13 usec=50
slurmctld  | slurmctld: debug2: Tree head got back 15
slurmctld  | slurmctld: debug2: Tree head got back 16
slurmctld  | slurmctld: debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from UID=0
slurmctld  | slurmctld: debug3: validate_node_specs: validating nodes gn03 in state: UNKNOWN
slurmctld  | slurmctld: debug:  validate_node_specs: node gn03 registered with 0 jobs
slurmctld  | slurmctld: debug2: _slurm_rpc_node_registration complete for gn03 usec=30
slurmctld  | slurmctld: debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from UID=0
slurmctld  | slurmctld: debug3: validate_node_specs: validating nodes gn04 in state: UNKNOWN
slurmctld  | slurmctld: debug:  validate_node_specs: node gn04 registered with 0 jobs
slurmctld  | slurmctld: debug2: _slurm_rpc_node_registration complete for gn04 usec=26
slurmctld  | slurmctld: debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from UID=0
slurmctld  | slurmctld: debug3: validate_node_specs: validating nodes gn02 in state: UNKNOWN
slurmctld  | slurmctld: debug:  validate_node_specs: node gn02 registered with 0 jobs
slurmctld  | slurmctld: debug2: _slurm_rpc_node_registration complete for gn02 usec=40
slurmctld  | slurmctld: debug2: Tree head got back 17
slurmctld  | slurmctld: debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from UID=0
slurmctld  | slurmctld: debug3: validate_node_specs: validating nodes gn05 in state: UNKNOWN
slurmctld  | slurmctld: debug:  validate_node_specs: node gn05 registered with 0 jobs
slurmctld  | slurmctld: debug2: _slurm_rpc_node_registration complete for gn05 usec=67
slurmctld  | slurmctld: debug2: Tree head got back 18
slurmctld  | slurmctld: debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from UID=0
slurmctld  | slurmctld: debug3: validate_node_specs: validating nodes gn06 in state: DRAINED
slurmctld  | slurmctld: error: _slurm_rpc_node_registration node=gn06: Invalid argument
slurmctld  | slurmctld: debug2: _slurm_connect: connect to 192.168.0.16:6818 in 2s: Connection timed out
slurmctld  | slurmctld: debug2: Error connecting slurm stream socket at 192.168.0.16:6818: Connection timed out
slurmctld  | slurmctld: debug:  sched: Running job scheduler for default depth.
slurmctld  | slurmctld: debug2: _slurm_connect: failed to connect to 192.168.0.16:6818: No route to host
slurmctld  | slurmctld: debug2: Error connecting slurm stream socket at 192.168.0.16:6818: No route to host
slurmctld  | slurmctld: debug3: problems with cn07
slurmctld  | slurmctld: debug2: Tree head got back 19
slurmctld  | slurmctld: agent/is_node_resp: node:cn07 RPC:REQUEST_NODE_REGISTRATION_STATUS : Communication connection failure
slurmctld  | slurmctld: debug2: node_did_resp cn03
slurmctld  | slurmctld: debug2: node_did_resp cn04
slurmctld  | slurmctld: debug2: node_did_resp cn01
slurmctld  | slurmctld: debug2: node_did_resp cn05
slurmctld  | slurmctld: debug2: node_did_resp cn10
slurmctld  | slurmctld: debug2: node_did_resp cn08
slurmctld  | slurmctld: debug2: node_did_resp cn09
slurmctld  | slurmctld: debug2: node_did_resp cn11
slurmctld  | slurmctld: debug2: node_did_resp cn13
slurmctld  | slurmctld: debug2: node_did_resp cn06
slurmctld  | slurmctld: debug2: node_did_resp gn01
slurmctld  | slurmctld: debug2: node_did_resp cn14
slurmctld  | slurmctld: debug2: node_did_resp cn12
slurmctld  | slurmctld: debug2: node_did_resp gn03
slurmctld  | slurmctld: debug2: node_did_resp gn02
slurmctld  | slurmctld: debug2: node_did_resp gn04
slurmctld  | slurmctld: debug2: node_did_resp gn05
slurmctld  | slurmctld: debug2: node_did_resp gn06
slurmctld  | slurmctld: debug2: Processing RPC: REQUEST_RESOURCE_ALLOCATION from UID=1000
slurmctld  | slurmctld: debug3: sched: Processing RPC: REQUEST_RESOURCE_ALLOCATION from uid=1000
slurmctld  | slurmctld: debug3: _set_hostname: Using auth hostname for alloc_node: nas
slurmctld  | slurmctld: debug3: JobDesc: user_id=1000 JobId=N/A partition=amd name=bash
slurmctld  | slurmctld: debug3:    cpus=1-4294967294 pn_min_cpus=1 core_spec=-1
slurmctld  | slurmctld: debug3:    Nodes=1-[1] Sock/Node=65534 Core/Sock=65534 Thread/Core=65534
slurmctld  | slurmctld: debug3:    pn_min_memory_cpu=1024 pn_min_tmp_disk=-1
slurmctld  | slurmctld: debug3:    immediate=0 reservation=(null)
slurmctld  | slurmctld: debug3:    features=(null) batch_features=(null) cluster_features=(null)
slurmctld  | slurmctld: debug3:    req_nodes=cn02 exc_nodes=(null)
slurmctld  | slurmctld: debug3:    time_limit=60-60 priority=-1 contiguous=0 shared=-1
slurmctld  | slurmctld: debug3:    kill_on_node_fail=-1 script=(null)
slurmctld  | slurmctld: debug3:    argv="bash"
slurmctld  | slurmctld: debug3:    stdin=(null) stdout=(null) stderr=(null)
slurmctld  | slurmctld: debug3:    work_dir=/var/log/slurm alloc_node:sid=nas:420825
slurmctld  | slurmctld: debug3:    power_flags=
slurmctld  | slurmctld: debug3:    resp_host=192.168.0.2 alloc_resp_port=39977 other_port=44137
slurmctld  | slurmctld: debug3:    dependency=(null) account=ias qos=(null) comment=(null)
slurmctld  | slurmctld: debug3:    mail_type=0 mail_user=(null) nice=0 num_tasks=1 open_mode=0 overcommit=-1 acctg_freq=(null)
slurmctld  | slurmctld: debug3:    network=(null) begin=Unknown cpus_per_task=1 requeue=-1 licenses=(null)
slurmctld  | slurmctld: debug3:    end_time= signal=0@0 wait_all_nodes=1 cpu_freq=
slurmctld  | slurmctld: debug3:    ntasks_per_node=1 ntasks_per_socket=-1 ntasks_per_core=-1 ntasks_per_tres=-1
slurmctld  | slurmctld: debug3:    mem_bind=0:(null) plane_size:65534
slurmctld  | slurmctld: debug3:    array_inx=(null)
slurmctld  | slurmctld: debug3:    burst_buffer=(null)
slurmctld  | slurmctld: debug3:    mcs_label=(null)
slurmctld  | slurmctld: debug3:    deadline=Unknown
slurmctld  | slurmctld: debug3:    bitflags=0x1800c000 delay_boot=4294967294
slurmctld  | slurmctld: debug5: assoc_mgr_fill_in_assoc: looking for assoc of user=(null)(1000), acct=ias, cluster=iascluster, partition=amd
slurmctld  | slurmctld: debug3: assoc_mgr_fill_in_assoc: found correct association of user=(null)(1000), acct=ias, cluster=iascluster, partition=amd to assoc=100 acct=ias
slurmctld  | slurmctld: debug3: found correct qos
slurmctld  | slurmctld: debug3: TRES Weight: cpu = 1.000000 * 1.000000 = 1.000000
slurmctld  | slurmctld: debug3: TRES Weight: mem = 1024.000000 * 0.000015 = 0.015625
slurmctld  | slurmctld: debug3: TRES Weight: energy = 0.000000 * 0.000000 = 0.000000
slurmctld  | slurmctld: debug3: TRES Weight: node = 1.000000 * 0.000000 = 0.000000
slurmctld  | slurmctld: debug3: TRES Weight: fs/disk = 0.000000 * 0.000000 = 0.000000
slurmctld  | slurmctld: debug3: TRES Weight: vmem = 0.000000 * 0.000000 = 0.000000
slurmctld  | slurmctld: debug3: TRES Weight: pages = 0.000000 * 0.000000 = 0.000000
slurmctld  | slurmctld: debug3: TRES Weight: gres/gpu = 0.000000 * 10000000000.000000 = 0.000000
slurmctld  | slurmctld: debug3: TRES Weighted: SUM(TRES) = 1.015625
slurmctld  | slurmctld: debug2: priority/multifactor: priority_p_set: initial priority for job 48 is 13139
slurmctld  | slurmctld: debug2: found 6 usable nodes from config containing cn[08-14]
slurmctld  | slurmctld: debug2: _build_node_list: JobId=48 matched 0 nodes (gn[01-05]) due to job partition or features
slurmctld  | slurmctld: debug2: _build_node_list: JobId=48 matched 0 nodes (gn06) due to job partition or features
slurmctld  | slurmctld: debug2: _build_node_list: JobId=48 matched 0 nodes (dgx-station) due to job partition or features
slurmctld  | slurmctld: debug2: found 7 usable nodes from config containing cn[01-07]
slurmctld  | slurmctld: debug3: _pick_best_nodes: JobId=48 idle_nodes 20 share_nodes 21
slurmctld  | slurmctld: debug2: select/cons_res: select_p_job_test: select_p_job_test for JobId=48
slurmctld  | slurmctld: debug3: TRES Weight: cpu = 1.000000 * 1.000000 = 1.000000
slurmctld  | slurmctld: debug3: TRES Weight: mem = 1024.000000 * 0.000015 = 0.015625
slurmctld  | slurmctld: debug3: TRES Weight: energy = 0.000000 * 0.000000 = 0.000000
slurmctld  | slurmctld: debug3: TRES Weight: node = 1.000000 * 0.000000 = 0.000000
slurmctld  | slurmctld: debug3: TRES Weight: fs/disk = 0.000000 * 0.000000 = 0.000000
slurmctld  | slurmctld: debug3: TRES Weight: vmem = 0.000000 * 0.000000 = 0.000000
slurmctld  | slurmctld: debug3: TRES Weight: pages = 0.000000 * 0.000000 = 0.000000
slurmctld  | slurmctld: debug3: TRES Weight: gres/gpu = 0.000000 * 10000000000.000000 = 0.000000
slurmctld  | slurmctld: debug3: TRES Weighted: SUM(TRES) = 1.015625
slurmctld  | slurmctld: debug3: select/cons_res: job_res_add_job: JobId=48 action:normal
slurmctld  | slurmctld: debug3: select/cons_res: job_res_add_job: adding JobId=48 to part amd row 0
slurmctld  | slurmctld: debug3: TRES Weight: cpu = 1.000000 * 1.000000 = 1.000000
slurmctld  | slurmctld: debug3: TRES Weight: mem = 1024.000000 * 0.000015 = 0.015625
slurmctld  | slurmctld: debug3: TRES Weight: energy = 18446744073709551616.000000 * 0.000000 = 0.000000
slurmctld  | slurmctld: debug3: TRES Weight: node = 1.000000 * 0.000000 = 0.000000
slurmctld  | slurmctld: debug3: TRES Weight: fs/disk = 0.000000 * 0.000000 = 0.000000
slurmctld  | slurmctld: debug3: TRES Weight: vmem = 0.000000 * 0.000000 = 0.000000
slurmctld  | slurmctld: debug3: TRES Weight: pages = 0.000000 * 0.000000 = 0.000000
slurmctld  | slurmctld: debug3: TRES Weight: gres/gpu = 0.000000 * 10000000000.000000 = 0.000000
slurmctld  | slurmctld: debug3: TRES Weighted: SUM(TRES) = 1.015625
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, qos normal grp_used_tres_run_secs(cpu) is 3600
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, qos normal grp_used_tres_run_secs(mem) is 3686400
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, qos normal grp_used_tres_run_secs(node) is 3600
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, qos normal grp_used_tres_run_secs(billing) is 3600
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, qos normal grp_used_tres_run_secs(fs/disk) is 0
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, qos normal grp_used_tres_run_secs(vmem) is 0
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, qos normal grp_used_tres_run_secs(pages) is 0
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, qos normal grp_used_tres_run_secs(gres/gpu) is 0
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, assoc 100(ias/ias/(null)) grp_used_tres_run_secs(cpu) is 3600
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, assoc 100(ias/ias/(null)) grp_used_tres_run_secs(mem) is 3686400
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, assoc 100(ias/ias/(null)) grp_used_tres_run_secs(node) is 3600
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, assoc 100(ias/ias/(null)) grp_used_tres_run_secs(billing) is 3600
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, assoc 100(ias/ias/(null)) grp_used_tres_run_secs(fs/disk) is 0
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, assoc 100(ias/ias/(null)) grp_used_tres_run_secs(vmem) is 0
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, assoc 100(ias/ias/(null)) grp_used_tres_run_secs(pages) is 0
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, assoc 100(ias/ias/(null)) grp_used_tres_run_secs(gres/gpu) is 0
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, assoc 3(ias/(null)/(null)) grp_used_tres_run_secs(cpu) is 3600
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, assoc 3(ias/(null)/(null)) grp_used_tres_run_secs(mem) is 3686400
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, assoc 3(ias/(null)/(null)) grp_used_tres_run_secs(node) is 3600
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, assoc 3(ias/(null)/(null)) grp_used_tres_run_secs(billing) is 3600
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, assoc 3(ias/(null)/(null)) grp_used_tres_run_secs(fs/disk) is 0
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, assoc 3(ias/(null)/(null)) grp_used_tres_run_secs(vmem) is 0
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, assoc 3(ias/(null)/(null)) grp_used_tres_run_secs(pages) is 0
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, assoc 3(ias/(null)/(null)) grp_used_tres_run_secs(gres/gpu) is 0
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(cpu) is 3600
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(mem) is 3686400
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(node) is 3600
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(billing) is 3600
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(fs/disk) is 0
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(vmem) is 0
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(pages) is 0
slurmctld  | slurmctld: debug2: acct_policy_job_begin: after adding JobId=48, assoc 1(root/(null)/(null)) grp_used_tres_run_secs(gres/gpu) is 0
slurmctld  | slurmctld: debug2: sched: JobId=48 allocated resources: NodeList=cn02
slurmctld  | slurmctld: sched: _slurm_rpc_allocate_resources JobId=48 NodeList=cn02 usec=1655
slurmctld  | slurmctld: debug3: create_mmap_buf: loaded file `/var/spool/slurmctld/job_state` as buf_t
slurmctld  | slurmctld: debug3: Writing job id 49 to header record of job_state file
slurmctld  | slurmctld: debug2: Processing RPC: REQUEST_JOB_READY from UID=1000
slurmctld  | slurmctld: debug2: _slurm_rpc_job_ready(48)=7 usec=1
slurmctld  | slurmctld: debug2: Processing RPC: REQUEST_JOB_STEP_CREATE from UID=1000
slurmctld  | slurmctld: debug3: StepDesc: user_id=1000 JobId=48 node_count=1-1 cpu_count=1 num_tasks=1
slurmctld  | slurmctld: debug3:    cpu_freq_gov=4294967294 cpu_freq_max=4294967294 cpu_freq_min=4294967294 relative=65534 task_dist=0x1 plane=1
slurmctld  | slurmctld: debug3:    node_list=cn02  constraints=(null)
slurmctld  | slurmctld: debug3:    host=nas port=43717 srun_pid=83463 name=bash network=(null) exclusive=yes
slurmctld  | slurmctld: debug3:    mem_per_cpu=1024 resv_port_cnt=65534 immediate=0 no_kill=no
slurmctld  | slurmctld: debug3:    overcommit=no time_limit=0
slurmctld  | slurmctld: debug3:    TRES_per_step=cpu:1
slurmctld  | slurmctld: debug3:    TRES_per_task=cpu:1
slurmctld  | slurmctld: debug3: step_layout cpus = 1 pos = 0
slurmctld  | slurmctld: debug:  laying out the 1 tasks on 1 hosts cn02 dist 1
slurmctld  | slurmctld: debug2: _group_cache_lookup_internal: no entry found for ias
slurmctld  | slurmctld: debug2: Processing RPC: REQUEST_COMPLETE_JOB_ALLOCATION from UID=1000
slurmctld  | slurmctld: debug3: Processing RPC details: REQUEST_COMPLETE_JOB_ALLOCATION for JobId=48 rc=0
slurmctld  | slurmctld: _job_complete: JobId=48 WEXITSTATUS 0
slurmctld  | slurmctld: debug2: Processing RPC: REQUEST_STEP_COMPLETE from UID=0
slurmctld  | slurmctld: debug3: select/cons_res: job_res_rm_job: JobId=48 action:normal
slurmctld  | slurmctld: debug3: select/cons_res: job_res_rm_job: removed JobId=48 from part amd row 0
slurmctld  | slurmctld: _job_complete: JobId=48 done
slurmctld  | slurmctld: debug2: _slurm_rpc_complete_job_allocation: JobId=48 
slurmctld  | slurmctld: debug2: Spawning RPC agent for msg_type SRUN_JOB_COMPLETE
slurmctld  | slurmctld: debug2: full switch release for JobId=48 StepId=0, nodes cn02
slurmctld  | slurmctld: debug2: Spawning RPC agent for msg_type SRUN_JOB_COMPLETE
slurmctld  | slurmctld: debug2: Spawning RPC agent for msg_type REQUEST_TERMINATE_JOB
slurmctld  | slurmctld: debug2: Tree head got back 0 looking for 1
slurmctld  | slurmctld: debug3: Tree sending to cn02
slurmctld  | slurmctld: debug2: Tree head got back 1
slurmctld  | slurmctld: debug2: node_did_resp cn02
slurmctld  | slurmctld: debug:  sched: Running job scheduler for default depth.
slurmctld  | slurmctld: debug:  Note large processing time from slurmdbd agent: full loop: usec=1721846 began=21:42:48.825
slurmctld  | slurmctld: debug3: create_mmap_buf: loaded file `/var/spool/slurmctld/job_state` as buf_t
slurmctld  | slurmctld: debug3: Writing job id 49 to header record of job_state file

Reply via email to