Re: [OMPI users] [Help] Must orted exit after all spawned proecesses exit

2021-05-19 Thread Ralph Castain via users
To answer your specific questions: The backend daemons (orted) will not exit until all locally spawned procs exit. This is not configurable - for one thing, OMPI procs will suicide if they see the daemon depart, so it makes no sense to have the daemon fail if a proc terminates. The logic behind

Re: [OMPI users] unable to launch a job on a system with OmniPath

2021-05-19 Thread Peter Kjellström via users
On Wed, 19 May 2021 15:53:50 +0200 Pavel Mezentsev via users wrote: > It took some time but my colleague was able to build OpenMPI and get > it working with OmniPath, however the performance is quite > disappointing. The configuration line used was the > following: ./configure --prefix=$INSTALL_P

Re: [OMPI users] unable to launch a job on a system with OmniPath

2021-05-19 Thread Heinz, Michael William via users
Right. there was a reference counting issue in OMPI that required a change to PSM2 to properly fix. There's a configuration option to disable the reference count check at build time, although I don't recall what the option is off the top of my head. From: Carlson, Timothy S Sent: Wednesday, M

Re: [OMPI users] unable to launch a job on a system with OmniPath

2021-05-19 Thread Heinz, Michael William via users
After thinking about this for a few more minutes, it occurred to me that you might be able to "fake" the required UUID support by passing it as a shell variable. For example: export OMPI_MCA_orte_precondition_transports="0123456789ABCDEF-0123456789ABCDEF" would probably do it. However, note tha

Re: [OMPI users] unable to launch a job on a system with OmniPath

2021-05-19 Thread Heinz, Michael William via users
So, the bad news is that the PSM2 MTL requires ORTE - ORTE generates a UUID to identify the job across all nodes in the fabric, allowing processes to find each other over OPA at init time. I believe the reason this works when you use OFI/libfabric is that libfabrice generates its own UUIDs. Fr

Re: [OMPI users] unable to launch a job on a system with OmniPath

2021-05-19 Thread Ralph Castain via users
The original configure line is correct ("--without-orte") - just a typo in the later text. You may be running into some issues with Slurm's built-in support for OMPI. Try running it with OMPI's "mpirun" instead and see if you get better performance. You'll have to reconfigure to remove the "--w

Re: [OMPI users] unable to launch a job on a system with OmniPath

2021-05-19 Thread Jorge D'Elia via users
- Mensaje original - > De: "Pavel Mezentsev via users" > Para: users@lists.open-mpi.org > CC: "Pavel Mezentsev" > Enviado: Miércoles, 19 de Mayo 2021 10:53:50 > Asunto: Re: [OMPI users] unable to launch a job on a system with OmniPath > > It took some time but my colleague was able to bui

Re: [OMPI users] unable to launch a job on a system with OmniPath

2021-05-19 Thread Pavel Mezentsev via users
It took some time but my colleague was able to build OpenMPI and get it working with OmniPath, however the performance is quite disappointing. The configuration line used was the following: ./configure --prefix=$INSTALL_PATH --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --enable-shared -