Hi everyone,I am currently stuck with an sacct issue and would appreciate any help/hints/ideas:
My users cannot retrieve job data from their currently running jobs through sacct anymore. Running sacct -a as root also reproduces this issue: It does not show running jobs, but both sacct -j <JobID> and squeue -j <JobID> do. AFAICT, this is not intended behavior (?). Also including longer time windows witch sacct -S ... -E did not help.
root@slurmmaster:~# sacct -a | grep 154415 # this returns nothing root@slurmmaster:~# sacct -j 154415JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 154415 allocation primevo 0 PENDING 0:0 154415.batch batch primevo 2 RUNNING 0:0 154415.exte+ extern primevo 2 RUNNING 0:0
root@slurmmaster:~# squeue -j 154415JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
154415 standard genedrop username R 1:31 1 hpc020 Also, possibly related, we had a slurmdbd crash before this changed.We run Ubuntu Server 24.04 LTS with Slurm 24.05.4, using a MariaDB accounting database hosted on the same machine as the Slurm controller.
Does anyone here have any ideas? Best, Pierre -- Pierre Abele, M.Sc. HPC Administrator Max-Planck-Institute for Evolutionary Anthropology Department of Primate Behavior and Evolution Deutscher Platz 6 04103 Leipzig Room: U2.80 E-Mail: pierre_ab...@eva.mpg.de Phone: +49 (0) 341 3550 245
smime.p7s
Description: S/MIME Cryptographic Signature
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com