Anna,
The monitoring PML tracks all activity on the PML but might choose to only
expose that one that the user can be interested in, aka its own messages,
and hide the rest of the traffic. This is easy in OMPI because all internal
messages are generated using negative tags (which are not allowed f
I'm not sure if I correctly understand the compiler complaint here, but I
think it is complaining about a non-optional dummy argument being
omitted from the call. In this case, I assume the issue is raised in the
mpif Fortran interface (not the f08 interface), due to the fact that the
error is not
What's the network on your cluster ? Without a very good network you cannot
obtain anything closer to the single GPU, because the data exchanged
between the two GPUs will become the bottleneck.
George.
On Wed, Jun 4, 2025 at 5:56 AM Shruti Sharma wrote:
> Hi
> I am currently running Horovod
Please ignore my prior answer, I just noticed you are running single-node.
In addition to Howard's suggestions, check if you have nvlink between GPUs.
George.
On Wed, Jun 4, 2025 at 10:11 AM George Bosilca wrote:
> What's the network on your cluster ? Without a very good network you
> canno
OMPI 5.x has no support for the openib BTL, all IB traffic is now going
through the UCX PML. This means that `-mca btl_openib_if_include XXX` is
meaningless, but you can use the UCX_NET_DEVICES to direct UCX to a
specific device.
As the error happens for UD you can switch to a different transport
UCX 1.8 or UCX 1.18 ?
Your application does not exchange any data so it is possible that MPICH
behavior differs from OMPI (aka not creating connections vs creating them
during MPI_Init). That's why running a slightly different version of the
hello_world with a barrier would clarify the connection'
This error message is usually due to a misconfiguration of the network.
However, I don't think this is the case here because the output contains
messages from both odd and even ranks (which according to your binding
policy were placed on different nodes) suggesting at least some of the
processes we