Re: [OMPI users] silent failure for large allgather

2019-09-13 Thread Emmanuel Thomé via users
Hi, Thanks Jeff for your reply, and sorry for this late follow-up... On Sun, Aug 11, 2019 at 02:27:53PM -0700, Jeff Hammond wrote: > > openmpi-4.0.1 gives essentially the same results (similar files > > attached), but with various doubts on my part as to whether I've run this > > check correctly.

[OMPI users] silent failure for large allgather

2019-08-06 Thread Emmanuel Thomé via users
Hi, In the attached program, the MPI_Allgather() call fails to communicate all data (the amount it communicates wraps around at 4G...). I'm running on an omnipath cluster (2018 hardware), openmpi 3.1.3 or 4.0.1 (tested both). With the OFI mtl, the failure is silent, with no error message reporte

[OMPI users] pml ^ucx + mtl ofi (nonsensical ?) --> segfault at large sizes

2019-07-19 Thread Emmanuel Thomé via users
Hi, I came across this. openmpi-4.0.1 compiled with: ../openmpi-4.0.1/configure --disable-mpi-fortran --without-cuda --disable-opencl --with-ucx=/path/to/ucx-1.5.1 The execution of the attached program (simple mpi_send / mpi_recv pair) gives a segfault when the message size exceeds 2^30. I'm see

Re: [OMPI users] wrong library version for dependent open-rte lib when libtool relinks

2016-02-27 Thread Emmanuel Thomé
> Note, too, that 1.10.2 has a bug that one of the core Open MPI libs has a > dependency on libibverbs (only Open MPI's plugins are supposed to be > dependent upon libibverbs). This was a mistake that is fixed in the 1.10.3 > nightly tarballs. Indeed, fixing this bug may have the side-effect o

Re: [OMPI users] wrong library version for dependent open-rte lib when libtool relinks

2016-02-27 Thread Emmanuel Thomé
Thanks for your analysis. On Sat, Feb 27, 2016 at 3:19 PM, Jeff Squyres (jsquyres) wrote: > [...] > 1. osmcomp should not have installed a .la file for a default linker location Probably not, although the no-brainer default solution does this (plus, the .la files say "do not delete"...). > 2. L

Re: [OMPI users] wrong library version for dependent open-rte lib when libtool relinks

2016-02-27 Thread Emmanuel Thomé
Here you go. http://www.loria.fr/~thome/vrac/logs.tar.bz2 E. On Sat, Feb 27, 2016 at 2:56 PM, Jeff Squyres (jsquyres) wrote: > Can you send all the build information listed here: > > https://www.open-mpi.org/community/help/ > > > >> On Feb 27, 2016, at 8:48 A

Re: [OMPI users] wrong library version for dependent open-rte lib when libtool relinks

2016-02-27 Thread Emmanuel Thomé
y chance, does libosmcomp.la contains a -rpath line ? > > FWIW, you can simply > make V=1 > In order to see how libtool is invoked, and how it will invoke bcc > > Cheers, > > Gilles > > On Saturday, February 27, 2016, Emmanuel Thomé > wrote: >> >> Hi, >&

Re: [OMPI users] wrong library version for dependent open-rte lib when libtool relinks

2016-02-27 Thread Emmanuel Thomé
Hi, I attach both $builddir/ompi/libmpi.la and /usr/lib/libosmcomp.la (both from a system where I kept that file). /usr/lib/libosmcomp.la has no embedded rpath information. FWIW, this .la file comes from the file MLNX_OFED_LINUX-3.1-1.0.3-debian8.1-x86_64/DEBS/libopensm_4.6.0.MLNX20150830.c69ebab

Re: [OMPI users] wrong library version for dependent open-rte lib when libtool relinks

2016-02-27 Thread Emmanuel Thomé
> >> On Feb 26, 2016, at 8:24 AM, Emmanuel Thomé wrote: >> >> On Fri, Feb 26, 2016 at 5:21 PM, Emmanuel Thomé >> wrote: >>> happens to have an openmpi-1.6.5 installation in /usr, as well as . >> >> So

Re: [OMPI users] wrong library version for dependent open-rte lib when libtool relinks

2016-02-26 Thread Emmanuel Thomé
On Fri, Feb 26, 2016 at 5:21 PM, Emmanuel Thomé wrote: > happens to have an openmpi-1.6.5 installation in /usr, as well as . Sorry for copy-paste failure. 1.6.5 is only in /usr, of course. E.

[OMPI users] wrong library version for dependent open-rte lib when libtool relinks

2016-02-26 Thread Emmanuel Thomé
I have a problem with the build and install process of openmpi-1.10.2. I have here a machine running Debian GNU/Linux 8.2 ; this machine also happens to have an openmpi-1.6.5 installation in /usr, as well as . This should not matter, but here it does. The machine also has an Infiniband software s

Re: [OMPI users] mmaped memory and openib btl.

2014-12-02 Thread Emmanuel Thomé
filed https://github.com/open-mpi/ompi/issues/299; feel free to follow > it with your github account to follow the progress. > > > > On Nov 29, 2014, at 8:49 AM, Emmanuel Thomé wrote: > >> Hi, >> >> I am still affected by the bug which I reported in the thread

Re: [OMPI users] mmaped memory and openib btl.

2014-11-29 Thread Emmanuel Thomé
. On Thu, Nov 13, 2014 at 7:09 PM, Emmanuel Thomé wrote: > Hi, > > It turns out that the DT_NEEDED libs for my a.out are: > Dynamic Section: > NEEDED libmpi.so.1 > NEEDED libpthread.so.0 > NEEDED libc.so.6 > which is absolutel

Re: [OMPI users] mmaped memory and openib btl.

2014-11-13 Thread Emmanuel Thomé
On Wed, Nov 12, 2014 at 7:51 PM, Emmanuel Thomé wrote: > yes I confirm. Thanks for saying that this is the supposed behaviour. > > In the binary, the code goes to munmap@plt, which goes to the libc, > not to libopen-pal.so > > libc is 2.13-38+deb7u1 > > I'm a total noo

Re: [OMPI users] mmaped memory and openib btl.

2014-11-12 Thread Emmanuel Thomé
this (1-line) function: > > - > /* intercept munmap, as the user can give back memory that way as well. */ > OPAL_DECLSPEC int munmap(void* addr, size_t len) > { > return opal_memory_linux_free_ptmalloc2_munmap(addr, len, 0); > } > - > > > > On Nov 12, 2014, at 11:08

Re: [OMPI users] mmaped memory and openib btl.

2014-11-12 Thread Emmanuel Thomé
consider any mmap()/munmap() rather unsafe to play with in an openmpi application. E. P.S: a last version of the test case is attached. Le 11 nov. 2014 19:48, "Emmanuel Thomé" a écrit : > > Thanks a lot for your analysis. This seems consistent with what I can > obtain by playing

Re: [OMPI users] File-backed mmaped I/O and openib btl.

2014-11-11 Thread Emmanuel Thomé
; node 0 iteration 3, lead word received from peer is 0x1001 [ok] > > I don't know enough about memory hooks or the registration cache > implementation to speak with any authority, but it looks like this is where > the issue resides. As a workaround, can you try your original e

Re: [OMPI users] File-backed mmaped I/O and openib btl.

2014-11-11 Thread Emmanuel Thomé
just gets zeroes). I attach the simplified test case. I hope someone will be able to reproduce the problem. Best regards, E. On Mon, Nov 10, 2014 at 5:48 PM, Emmanuel Thomé wrote: > Thanks for your answer. > > On Mon, Nov 10, 2014 at 4:31 PM, Joshua Ladd wrote: >> Just really qui

Re: [OMPI users] File-backed mmaped I/O and openib btl.

2014-11-10 Thread Emmanuel Thomé
in reduce_scatter and allgather in the code. Collectives are with communicators of 2 nodes, and we're talking (for the smallest failing run) 8kb per node (i.e. 16kb total for an allgather). E. > On Mon, Nov 10, 2014 at 9:29 AM, Emmanuel Thomé > wrote: >> >> Hi, >> &g

[OMPI users] File-backed mmaped I/O and openib btl.

2014-11-10 Thread Emmanuel Thomé
Hi, I'm stumbling on a problem related to the openib btl in openmpi-1.[78].*, and the (I think legitimate) use of file-backed mmaped areas for receiving data through MPI collective calls. A test case is attached. I've tried to make is reasonably small, although I recognize that it's not extra thi