Hoping someone can tell me if I’m just thinking about this wrong, or if maybe 
this is somewhere with room for improvement.

I recently upgraded my cluster to 22.05.8 and am testing out gpu sharding on a 
subset of GPUs, specifically my T4’s.

> -------------------------------------------------------------------------------
> Cluster Utilization 2023-02-13T00:00:00 - 2023-02-13T23:59:59
> Usage reported in Percentage of Total
> -------------------------------------------------------------------------------
>      TRES Name Allocate        Down PLND Dow         Idle  Planned     
> Reported
> -------------- -------- ----------- -------- ------------ -------- 
> ------------
>    gres/gpu:t4    0.00%       0.00%    0.00%      100.00%    0.00%      
> 100.00%
>     gres/shard   37.06%       0.00%    0.00%       62.94%    0.00%      
> 100.00%

What seems odd to me is that I have shards being consumed, which is implicitly 
consuming the gpu:t4(s).
However, sreport makes it appear as though the T4’s were completely idle, which 
is not true.

I know that shards and gpu’s are not a 1:1 allocation, if anything the gpu 
allocation would almost always be greater than shard allocation.
But in my head that seems like that should be the case, given that the gpu’s 
are not idle, and in fact allocated, if only “partially.”

I know shards are a new concept and likely will evolve over time, but wanted to 
see if anyone had run into or thought similarly about this concept.

Reed

Reply via email to