Hello Guys, We are running GPU clusters with Slurm and SlurmDBD (version 19.05 series) and some of GPUs seemed to get troubles for attached jobs. To investigate if the troubles happened on the same GPUs, I'd like to get GPU indices of the completed jobs.
In my understanding `scontrol show job` can show the indices (as IDX in gres info) but cannot be used for completed job. And also `sacct -j` is available for complete jobs but won't print the indices. Is there any way (commands, configurations, etc...) to see the allocated GPU indices for completed jobs? Best regards, -------------------------------------------- 露崎 浩太 (Kota Tsuyuzaki) kota.tsuyuzaki...@hco.ntt.co.jp NTTソフトウェアイノベーションセンタ 分散処理基盤技術プロジェクト 0422-59-2837 ---------------------------------------------