Re: [OMPI users] Singleton and Spawn

2019-09-25 Thread Ralph Castain via users
It's a different code path, that's all - just a question of what path gets traversed. Would you mind posting a little more info on your two use-cases? For example, do you have a default hostfile telling mpirun what machines to use?  On Sep 25, 2019, at 12:41 PM, Martín Morales mailto:martinedu

Re: [OMPI users] Singleton and Spawn

2019-09-25 Thread Martín Morales via users
Thanks Ralph, but if I have a wrong hostfile path in my MPI_Comm_spawn function, why it works if I run with mpirun (Eg. mpirun -np 1 ./spawnExample)? De: Ralph Castain Enviado: miércoles, 25 de septiembre de 2019 15:42 Para: Open MPI Users Cc: steven.va...@gmail.

Re: [OMPI users] UCX errors after upgrade

2019-09-25 Thread Jeff Squyres (jsquyres) via users
Thanks Raymond; I have filed an issue for this on Github and tagged the relevant Mellanox people: https://github.com/open-mpi/ompi/issues/7009 On Sep 25, 2019, at 3:09 PM, Raymond Muno via users mailto:users@lists.open-mpi.org>> wrote: We are running against 4.0.2RC2 now. This is ussing

Re: [OMPI users] UCX errors after upgrade

2019-09-25 Thread Raymond Muno via users
As a test, I rebooted a set of nodes. The user could run on 480 cores, on 5 nodes. We could not run beyond two nodes previous to that. We still get the VM_UNMAP warning, however. On 9/25/19 2:09 PM, Raymond Muno via users wrote: We are running against 4.0.2RC2 now. This is ussing current Inte

Re: [OMPI users] UCX errors after upgrade

2019-09-25 Thread Raymond Muno via users
We are running against 4.0.2RC2 now. This is ussing current Intel compilers, version 2019update4. Still having issues. [epyc-compute-1-3.local:17402] common_ucx.c:149  Warning: UCX is unable to handle VM_UNMAP event. This may cause performance degradation or data corruption. [epyc-compute-1-3.

Re: [OMPI users] Singleton and Spawn

2019-09-25 Thread Ralph Castain via users
Yes, of course it can - however, I believe there is a bug in the add-hostfile code path. We can address that problem far easier than moving to a different interconnect. On Sep 25, 2019, at 11:39 AM, Martín Morales via users mailto:users@lists.open-mpi.org> > wrote: Thanks Steven. So, actually

Re: [OMPI users] Singleton and Spawn

2019-09-25 Thread Martín Morales via users
Thanks Steven. So, actually it can’t spawns from a singleton? De: users en nombre de Steven Varga via users Enviado: miércoles, 25 de septiembre de 2019 14:50 Para: Open MPI Users Cc: Steven Varga Asunto: Re: [OMPI users] Singleton and Spawn As far as I know

Re: [OMPI users] UCX errors after upgrade

2019-09-25 Thread Jeff Squyres (jsquyres) via users
Can you try the latest 4.0.2rc tarball? We're very, very close to releasing v4.0.2... I don't know if there's a specific UCX fix in there, but there are a ton of other good bug fixes in there since v4.0.1. On Sep 25, 2019, at 2:12 PM, Raymond Muno via users mailto:users@lists.open-mpi.org>>

[OMPI users] UCX errors after upgrade

2019-09-25 Thread Raymond Muno via users
We are primarily using OpenMPI 3.1.4 but also have 4.0.1 installed. On our cluster, we were running CentOS 7.5 with updates, alongside MLNX_OFED 4.5.x.   OpenMPI was compiled with GCC, Intel, PGI and AOCC compilers. We could run with no issues. To accommodate updates needed to get our IB gear

Re: [OMPI users] Singleton and Spawn

2019-09-25 Thread Steven Varga via users
As far as I know you have to wire up the connections among MPI clients, allocate resources etc. PMIx is a library to set up all processes, and shipped with openmpi. The standard HPC method to launch tasks is through job schedulers such as SLURM or GRID Engine. SLURM srun is very similar to mpirun:

[OMPI users] Singleton and Spawn

2019-09-25 Thread Martín Morales via users
Hi all! This is my first post. I'm newbie on Open MPI (and on MPI likewise!). I recently build the current version of this fabulous software (v4.0.1) on two Ubuntu 18 machines (a little part of our Beowulf Cluster). I already read (a lot) the FAQ and posts on the mail users list but I cant figur

Re: [OMPI users] Do not use UCX for shared memory

2019-09-25 Thread Adrian Reber via users
Thanks, that works. I also opened a UCX bug report and I already got a fix for it: https://github.com/openucx/ucx/issues/4224 With that patch also UCX detects user namespace correctly. Adrian On Tue, Sep 24, 2019 at 12:12:54PM -0600, Nathan Hjelm wrote: > You can use the uct bt

Re: [OMPI users] silent failure for large allgather

2019-09-25 Thread Heinz, Michael William via users
Emmanuel Thomé, Thanks for bringing this to our attention. It turns out this issue affects all OFI providers in open-mpi. We've applied a fix to the 3.0.x and later branches of open-mpi/ompi on github. However, you should be aware that this fix simply adds the appropriate error message, it does