[OMPI users] Performances problems with OpenMPI 5.0.5 and UCX 1.17.0 with Qlogiq infiniband

2024-09-30 Thread Patrick Begou via users
Hi, I'm working on refreshing an old cluster with Almalinux 9 (instead of CentOS6 😕) and building a fresh OpenMPI 5.0.5 environment. I've reached the step where OpenMPI begins to work with ucx 1.17 and Pmix 5.0.3 but not totally. Nodes are using a Qlogic QDR HBA with a managed Qlogic switch (

Re: [OMPI users] Performances problems with OpenMPI 5.0.5 and UCX 1.17.0 with Qlogiq infiniband

2024-09-30 Thread Nathan Hjelm via users
If this is a QLogic system why not try psm2 (--mca pml cm --mca mtl psm2)? Not sure how good UCX support is over these systems and psm2 is the vendor's library. Not sure what the right link is to the current version but found this version: GitHub - cornelisnetworks/opa-psm2 github.com -Nathan O

Re: [OMPI users] Performances problems with OpenMPI 5.0.5 and UCX 1.17.0 with Qlogiq infiniband

2024-09-30 Thread Patrick Begou via users
Hi Nathan thanks for this suggestion. I have understood that now all is managed by the UCX layer. Am I wrong ? These options do not seams to work with my openMPI 5.0.5 build. But I've built OpenMPI on the cluster front-end and it had no HBA at this  time. I've added one this evening (an old sp

Re: [OMPI users] [EXTERNAL] Issue with mpirun inside a container

2024-09-30 Thread Gilles Gouaillardet via users
Jeff, there are several options... First if you want to do containers and you are not tight to docker, singularity is a better fit. If you have a resource manager that features a PMIx server, you would simply direct run. For example with SLURM: srun singularity exec container.sif a.out I do not

Re: [OMPI users] [EXTERNAL] Issue with mpirun inside a container

2024-09-30 Thread Jeffrey Layton via users
Gilles, This was exactly it - thank you. If I wanted to run the code in the container across multiple nodes, I would need to do something like "mpirun ... 'docker run ...' "? Thanks! Jeff On Mon, Sep 30, 2024 at 2:38 AM Gilles Gouaillardet via users < users@lists.open-mpi.org> wrote: > Jeffr