[OMPI users] PML issue with openmpi5 and ucx

2024-08-29 Thread Merritt, Todd R - (tmerritt) via users
I am having a devil of a time tracking the cause of this error down and the debugging output from mpirun is not helpful to my mortal eyes so I'm reaching out to the community here for some help. I've built openmpi5 with pmix and ucx support. I'm running on a slurm cluster with roce. Under slurm,

Re: [OMPI users] [EXTERNAL] PML issue with openmpi5 and ucx

2024-08-29 Thread Pritchard Jr., Howard via users
HI Todd, You may want to ask UCX what’s going wrong. See if setting this env variable provides more info: export UCX_LOG_LEVEL=debug Have you tried to run the UCX smoke tests? https://github.com/openucx/ucx?tab=readme-ov-file#running-internal-unit-tests Howard From: users on behalf of "Mer

[OMPI users] How to launch MPMD with partial oversubscription?

2024-08-29 Thread Jianyu Liu via users
Hi, PBS version : 2021.1.3 OpenMPI : 5.0.3 I’d like to do a partial oversubscription with MPMD, like this 1. Request 40 trunks in total 2. First group of 32 trunks with 128 cores per trunk, 1 MPI rank per core, 1 OMP threads per MPI rank 3. Second group of 8 trunks with 128 cores per tru