I am confused by the reported amount of Down and PLND Down by sreport.
According to it, our cluster would have had a significant amount of
downtime, which I know didn't happen (or, according to the documentation
"time that slurmctld was not responding", see
https://slurm.schedmd.com/sreport.html)
Could it be my purge settings causing this problem? How can I check (maybe
in some logs, maybe in the future) if actually slurmctld was not
responding? The expected long-term numbers should be less than the ones
reported for last month when we had an issue with a few nodes
Thanks!
[davide@login ~]$ grep Purge /opt/slurm/slurmdbd.conf
#JobPurge=12
#StepPurge=1
PurgeEventAfter=1month
PurgeJobAfter=12month
PurgeResvAfter=1month
PurgeStepAfter=1month
PurgeSuspendAfter=1month
[davide@login ~]$ sreport -t percent -T cpu,mem cluster utilization
start=2/1/22
Cluster Utilization 2022-02-01T00:00:00 - 2024-08-21T23:59:59
Usage reported in Percentage of Total
Cluster TRES Name AllocatedDown PLND Down
Idle Planned Reported
- -- --- --- --
-
clustercpu 19.50% 12.07% 3.92%
64.36% 0.15% 100.03%
clustermem 16.13% 13.17% 4.56%
66.13% 0.00% 99.99%
[davide@login ~]$sreport -t percent -T cpu,mem cluster utilization
start=2/1/23
Cluster Utilization 2023-02-01T00:00:00 - 2024-08-21T23:59:59
Usage reported in Percentage of Total
Cluster TRES Name AllocatedDown PLND Down
Idle Planned Reported
- -- --- --- --
--- -
clustercpu 28.74% 18.80% 6.44%
45.77% 0.24% 100.02%
clustermem 22.52% 20.54% 7.38%
49.55% 0.00% 99.98%
[davide@login ~]$ sreport -t percent -T cpu,mem cluster utilization
start=2/1/24
Cluster Utilization 2024-02-01T00:00:00 - 2024-08-21T23:59:59
Usage reported in Percentage of Total
Cluster TRES Name AllocatedDown PLND Down
Idle PlannedReported
- -- -- --- --
--- ---
clustercpu 29.92% 24.88% 17.73%
27.45%0.02% 100.00%
clustermem 20.07% 28.60% 19.57%
31.76%0.00% 100.00%
[davide@login ~]$ sreport -t percent -T cpu,mem cluster utilization
start=8/8/24
Cluster Utilization 2024-08-08T00:00:00 - 2024-08-21T23:59:59
Usage reported in Percentage of Total
Cluster TRES Name Allocated Down PLND Dow Idle
Planned Reported
- -- - --
--
clustercpu15.96%2.53%0.00% 81.51%
0.00%100.00%
clustermem 9.18%2.22%0.00% 88.60%
0.00%100.00%
[davide@login ~]$ sreport -t percent -T cpu,mem cluster utilization
start=7/7/24
Cluster Utilization 2024-07-07T00:00:00 - 2024-08-21T23:59:59
Usage reported in Percentage of Total
Cluster TRES Name Allocated Down PLND Dow
Idle Planned Reported
- -- -- -
-- --
clustercpu 27.07% 2.57%0.00%
70.34%0.02%100.00%
clustermem 17.35% 2.26%0.00%
80.40%0.00%100.00%
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com