Hi everyone,

I am currently stuck with an sacct issue and would appreciate any help/hints/ideas:

My users cannot retrieve job data from their currently running jobs through sacct anymore. Running sacct -a as root also reproduces this issue: It does not show running jobs, but both sacct -j <JobID> and squeue -j <JobID> do. AFAICT, this is not intended behavior (?). Also including longer time windows witch sacct -S ... -E did not help.

root@slurmmaster:~# sacct -a | grep 154415 # this returns nothing
root@slurmmaster:~# sacct -j 154415
JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 154415 allocation primevo 0 PENDING 0:0 154415.batch batch primevo 2 RUNNING 0:0 154415.exte+ extern primevo 2 RUNNING 0:0
root@slurmmaster:~# squeue -j 154415
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
            154415  standard genedrop username  R       1:31      1 hpc020

Also, possibly related, we had a slurmdbd crash before this changed.

We run Ubuntu Server 24.04 LTS with Slurm 24.05.4, using a MariaDB accounting database hosted on the same machine as the Slurm controller.

Does anyone here have any ideas?

Best,
Pierre

--
Pierre Abele, M.Sc.

HPC Administrator
Max-Planck-Institute for Evolutionary Anthropology
Department of Primate Behavior and Evolution

Deutscher Platz 6
04103 Leipzig

Room: U2.80
E-Mail: pierre_ab...@eva.mpg.de
Phone: +49 (0) 341 3550 245

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to