Re: [OMPI users] MPI Tool Information Interface (MPI_T), details on collective communication

2025-04-23 Thread 'George Bosilca' via Open MPI users
Anna, The monitoring PML tracks all activity on the PML but might choose to only expose that one that the user can be interested in, aka its own messages, and hide the rest of the traffic. This is easy in OMPI because all internal messages are generated using negative tags (which are not allowed f

Re: [OMPI users] Disable PMPI bindings?

2025-02-17 Thread 'George Bosilca' via Open MPI users
I'm not sure if I correctly understand the compiler complaint here, but I think it is complaining about a non-optional dummy argument being omitted from the call. In this case, I assume the issue is raised in the mpif Fortran interface (not the f08 interface), due to the fact that the error is not

Re: [OMPI users] Horovod Performance with OpenMPI

2025-06-04 Thread 'George Bosilca' via Open MPI users
What's the network on your cluster ? Without a very good network you cannot obtain anything closer to the single GPU, because the data exchanged between the two GPUs will become the bottleneck. George. On Wed, Jun 4, 2025 at 5:56 AM Shruti Sharma wrote: > Hi > I am currently running Horovod

Re: [OMPI users] Horovod Performance with OpenMPI

2025-06-04 Thread 'George Bosilca' via Open MPI users
Please ignore my prior answer, I just noticed you are running single-node. In addition to Howard's suggestions, check if you have nvlink between GPUs. George. On Wed, Jun 4, 2025 at 10:11 AM George Bosilca wrote: > What's the network on your cluster ? Without a very good network you > canno

Re: [OMPI users] Openmpi5.0.7 causes fatal timeout on last rank

2025-07-02 Thread 'George Bosilca' via Open MPI users
OMPI 5.x has no support for the openib BTL, all IB traffic is now going through the UCX PML. This means that `-mca btl_openib_if_include XXX` is meaningless, but you can use the UCX_NET_DEVICES to direct UCX to a specific device. As the error happens for UD you can switch to a different transport

Re: [OMPI users] Openmpi5.0.7 causes fatal timeout on last rank

2025-07-02 Thread 'George Bosilca' via Open MPI users
UCX 1.8 or UCX 1.18 ? Your application does not exchange any data so it is possible that MPICH behavior differs from OMPI (aka not creating connections vs creating them during MPI_Init). That's why running a slightly different version of the hello_world with a barrier would clarify the connection'

Re: [OMPI users] Openmpi5.0.7 causes fatal timeout on last rank

2025-07-01 Thread 'George Bosilca' via Open MPI users
This error message is usually due to a misconfiguration of the network. However, I don't think this is the case here because the output contains messages from both odd and even ranks (which according to your binding policy were placed on different nodes) suggesting at least some of the processes we