Please ignore my prior answer, I just noticed you are running single-node.
In addition to Howard's suggestions, check if you have nvlink between GPUs.
George.
On Wed, Jun 4, 2025 at 10:11 AM George Bosilca wrote:
> What's the network on your cluster ? Without a very good network you
> canno
Hi Shruti,
What version of NCCL is installed on the system? Horovod has environment
variables you can set to force use of NCCL.
You may want to use nvidia-smi to double check whether the benchmark is
actually using both GPUs when using two mpi processes.
Also, you may want to consult with a t
--
Mit freundlichen Grüßen
Udo Ziegenhagel
---
Dipl.-Phys. Udo Ziegenhagel
IT-Administrator
Rechenzentrum
Fraunhofer-Institut für Kurzzeitdynamik
Ernst-Zermelo-Straße 4
D-79104 Freiburg
Germany
Tel. +49 (0) 761/2714-382
Fax +49 (0) 761/2714-1382
Email: z
What's the network on your cluster ? Without a very good network you cannot
obtain anything closer to the single GPU, because the data exchanged
between the two GPUs will become the bottleneck.
George.
On Wed, Jun 4, 2025 at 5:56 AM Shruti Sharma wrote:
> Hi
> I am currently running Horovod
Hi
I am currently running Horovod benchmarks in an intra-node setup. However,
I have observed that increasing the number of GPUs does not result in a
proportional increase in total throughput. Specifically, the throughput per
GPU with a single GPU is approximately 842.6 ± 2.4, whereas with two
Hi
I am currently running Horovod benchmarks in an intra-node setup. However,
I have observed that increasing the number of GPUs does not result in a
proportional increase in total throughput. Specifically, the throughput per
GPU with a single GPU is approximately 842.6 ± 2.4, whereas with two