What's the network on your cluster ? Without a very good network you cannot
obtain anything closer to the single GPU, because the data exchanged
between the two GPUs will become the bottleneck.

  George.


On Wed, Jun 4, 2025 at 5:56 AM Shruti Sharma <shrutic...@gmail.com> wrote:

> Hi
> I am currently running Horovod benchmarks in an intra-node setup. However,
> I have observed that increasing the number of GPUs does not result in a
> proportional increase in total throughput. Specifically, the throughput per
> GPU with a single GPU is approximately 842.6 ± 2.4, whereas with two GPUs,
> the total throughput is around 485.7 ± 44.8, which translates to
> approximately 242.8 ± 22.4 per GPU.
>
> The configuration for the test is:
> MPI : OpenMPI - 5.0.6
> HOROVOD : 0.28.1
> pytorch : 1.12.1
> GPU : NVIDIA A100
> CUDA : 11.8
> Python : 3.10
> GCC : 8.5.0
>
> command : mpirun -n 1 --report-bindings python
> pytorch_synthetic_benchmark.py -batch-size=64 --model=resnet50
> [gpu39:59123] Rank 0 bound package[0][core:0]
>
> Model: resnet50
>
> Batch size: 64
>
> Number of GPUs: 1
>
> Running warmup...
>
> Running benchmark...
>
> Iter #0: 844.3 img/sec per GPU
>
> Iter #1: 844.0 img/sec per GPU
>
> Iter #2: 843.6 img/sec per GPU
>
> Iter #3: 843.5 img/sec per GPU
>
> Iter #4: 843.5 img/sec per GPU
>
> Iter #5: 842.0 img/sec per GPU
>
> Iter #6: 841.3 img/sec per GPU
>
> Iter #7: 841.8 img/sec per GPU
>
> Iter #8: 841.1 img/sec per GPU
>
> Iter #9: 841.1 img/sec per GPU
>
> Img/sec per GPU: 842.6 +-2.4
>
> Total img/sec on 1 GPU(s): 842.6 +-2.4
>
>
> Run with two GPU(s) on the same node
> command : mpirun -n 2 --report-bindings python
> pytorch_synthetic_benchmark.py -batch-size=64 --model=resnet50
> [gpu39:59166] Rank 0 bound package[0][core:0]
> [gpu39:59166] Rank 1 bound package[0][core:1]
>
> Model: resnet50
>
> Batch size: 64
>
> Number of GPUs: 2
>
> Running warmup...
>
> Running benchmark...
>
> Iter #0: 235.7 img/sec per GPU
>
> Iter #1: 251.5 img/sec per GPU
>
> Iter #2: 217.0 img/sec per GPU
>
> Iter #3: 239.4 img/sec per GPU
>
> Iter #4: 257.2 img/sec per GPU
>
> Iter #5: 258.3 img/sec per GPU
>
> Iter #6: 248.4 img/sec per GPU
>
> Iter #7: 242.6 img/sec per GPU
>
> Iter #8: 238.0 img/sec per GPU
>
> Iter #9: 240.3 img/sec per GPU
>
> Img/sec per GPU: 242.8 +-22.4
>
> Total img/sec on 2 GPU(s): 485.7 +-44.8
>
> To unsubscribe from this group and stop receiving emails from it, send an
> email to users+unsubscr...@lists.open-mpi.org.
>

To unsubscribe from this group and stop receiving emails from it, send an email 
to users+unsubscr...@lists.open-mpi.org.

Reply via email to