In a 25 node heterogeneous cluster with 4 different types of GPUs, to get
granular to see which GPUs were used most over a time period we have to set
AccountingStorageTRES to something like:
AccountingStorageTRES=gres/gpu,gres/gpu:rtx8000,gres/gpu:v100s,gres/gpu:a40,gres/gpu:a100

Unfortunately it's currently at:
AccountingStorageTRES=gres/gpu

At least all nodes have the same GPU within each node. What are some good
options to sreport to get details on usage over a year, e.g., percentage of
CPU vs GPU, which partitions/accounts used the most GPUs, etc.

>From this example:
sreport -tminper -t Percent cluster utilization --tres="cpu,gres/gpu"
start=2023-07-01
--------------------------------------------------------------------------------
Cluster Utilization 2023-07-01T00:00:00 - 2024-08-15T23:59:59
Usage reported in Percentage of Total
--------------------------------------------------------------------------------
  Cluster      TRES Name   Allocated       Down PLND Dow        Idle
Reserved    Reported
--------- -------------- ----------- ---------- -------- -----------
---------- -----------
     cluster            cpu      43.81%      2.87%    0.00%      48.35%
 4.97%      99.86%
     cluster       gres/gpu      50.36%      3.59%    0.00%      46.05%
 0.00%     100.38%

Is that showing that 50% of all jobs were run with GPUs? How do we read the
Idle column? Why does Reported show > 100% for gres?
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to