Hi all,

We are doing a simple setup for a Slurm cluster (version 23.11.6). We follow 
the documentation and we are trying a setup still without accounting or 
slurmdbd. The slurm.conf is really simple:
```
ClusterName=Develop
SlurmctldHost=head

# Slurm configuration
AuthType=auth/munge
CryptoType=crypto/munge
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdLogFile=/var/log/slurm/slurmd.log
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
StateSaveLocation=/var/spool/slurmctld

# Nodes
NodeName=worker1 CoresPerSocket=2 Sockets=1 ThreadsPerCore=1
NodeName=worker2 CoresPerSocket=2 Sockets=1 ThreadsPerCore=1

# Partitions
PartitionName=develop Default=YES MaxTime=UNLIMITED Nodes="worker1,worker2"
```

When running a simple `srun sleep 10`, all works well and the log file shows:

[2024-05-15T12:34:12.741] sched: _slurm_rpc_allocate_resources JobId=1 
NodeList=worker1 usec=549
[2024-05-15T12:34:22.775] _job_complete: JobId=1 WEXITSTATUS 0
[2024-05-15T12:34:22.775] _job_complete: JobId=1 done

But when creating a scrip with the same sleep command, and submiting using 
`sbatch test.sh`, the log shows:

[2024-05-15T12:35:39.916] _slurm_rpc_submit_batch_job: JobId=2 InitPrio=1 
usec=368
[2024-05-15T12:35:40.000] error: _refresh_assoc_mgr_qos_list: no new list given 
back keeping cached one.
[2024-05-15T12:35:40.000] sched: JobId=2 has invalid account
[2024-05-15T12:35:40.145] sched/backfill: _start_job: Started JobId=2 in 
develop on worker1
[2024-05-15T12:35:50.172] _job_complete: JobId=2 WEXITSTATUS 0
[2024-05-15T12:35:50.172] _job_complete: JobId=2 done

We have the same account with the UID and GID, as said in the documentation. 
Looking at the function that seems to spit out that error 
(https://github.com/SchedMD/slurm/blob/e9f28ede27795f525e62f998cb2d40931d884e8b/src/common/assoc_mgr.c#L1952),
 it appears like there should be some accounting setup? We do not have slurmdbd 
setup and the documentation states we should test basic functionality before 
implementing that daemon.

Any tips? Thanks in advance.
João

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to