Hi,
Thanks Jeff for your reply, and sorry for this late follow-up...
On Sun, Aug 11, 2019 at 02:27:53PM -0700, Jeff Hammond wrote:
> > openmpi-4.0.1 gives essentially the same results (similar files
> > attached), but with various doubts on my part as to whether I've run this
> > check correctly.
Hi,
In the attached program, the MPI_Allgather() call fails to communicate
all data (the amount it communicates wraps around at 4G...). I'm running
on an omnipath cluster (2018 hardware), openmpi 3.1.3 or 4.0.1 (tested both).
With the OFI mtl, the failure is silent, with no error message reporte
Hi,
I came across this. openmpi-4.0.1 compiled with:
../openmpi-4.0.1/configure --disable-mpi-fortran --without-cuda
--disable-opencl --with-ucx=/path/to/ucx-1.5.1
The execution of the attached program (simple mpi_send / mpi_recv pair)
gives a segfault when the message size exceeds 2^30. I'm see
> Note, too, that 1.10.2 has a bug that one of the core Open MPI libs has a
> dependency on libibverbs (only Open MPI's plugins are supposed to be
> dependent upon libibverbs). This was a mistake that is fixed in the 1.10.3
> nightly tarballs. Indeed, fixing this bug may have the side-effect o
Thanks for your analysis.
On Sat, Feb 27, 2016 at 3:19 PM, Jeff Squyres (jsquyres)
wrote:
> [...]
> 1. osmcomp should not have installed a .la file for a default linker location
Probably not, although the no-brainer default solution does this
(plus, the .la files say "do not delete"...).
> 2. L
Here you go.
http://www.loria.fr/~thome/vrac/logs.tar.bz2
E.
On Sat, Feb 27, 2016 at 2:56 PM, Jeff Squyres (jsquyres)
wrote:
> Can you send all the build information listed here:
>
> https://www.open-mpi.org/community/help/
>
>
>
>> On Feb 27, 2016, at 8:48 A
y chance, does libosmcomp.la contains a -rpath line ?
>
> FWIW, you can simply
> make V=1
> In order to see how libtool is invoked, and how it will invoke bcc
>
> Cheers,
>
> Gilles
>
> On Saturday, February 27, 2016, Emmanuel Thomé
> wrote:
>>
>> Hi,
>&
Hi,
I attach both $builddir/ompi/libmpi.la and /usr/lib/libosmcomp.la
(both from a system where I kept that file).
/usr/lib/libosmcomp.la has no embedded rpath information. FWIW, this
.la file comes from the file
MLNX_OFED_LINUX-3.1-1.0.3-debian8.1-x86_64/DEBS/libopensm_4.6.0.MLNX20150830.c69ebab
>
>> On Feb 26, 2016, at 8:24 AM, Emmanuel Thomé wrote:
>>
>> On Fri, Feb 26, 2016 at 5:21 PM, Emmanuel Thomé
>> wrote:
>>> happens to have an openmpi-1.6.5 installation in /usr, as well as .
>>
>> So
On Fri, Feb 26, 2016 at 5:21 PM, Emmanuel Thomé
wrote:
> happens to have an openmpi-1.6.5 installation in /usr, as well as .
Sorry for copy-paste failure. 1.6.5 is only in /usr, of course.
E.
I have a problem with the build and install process of openmpi-1.10.2.
I have here a machine running Debian GNU/Linux 8.2 ; this machine also
happens to have an openmpi-1.6.5 installation in /usr, as well as .
This should not matter, but here it does.
The machine also has an Infiniband software s
filed https://github.com/open-mpi/ompi/issues/299; feel free to follow
> it with your github account to follow the progress.
>
>
>
> On Nov 29, 2014, at 8:49 AM, Emmanuel Thomé wrote:
>
>> Hi,
>>
>> I am still affected by the bug which I reported in the thread
.
On Thu, Nov 13, 2014 at 7:09 PM, Emmanuel Thomé
wrote:
> Hi,
>
> It turns out that the DT_NEEDED libs for my a.out are:
> Dynamic Section:
> NEEDED libmpi.so.1
> NEEDED libpthread.so.0
> NEEDED libc.so.6
> which is absolutel
On Wed, Nov 12, 2014 at 7:51 PM, Emmanuel Thomé
wrote:
> yes I confirm. Thanks for saying that this is the supposed behaviour.
>
> In the binary, the code goes to munmap@plt, which goes to the libc,
> not to libopen-pal.so
>
> libc is 2.13-38+deb7u1
>
> I'm a total noo
this (1-line) function:
>
> -
> /* intercept munmap, as the user can give back memory that way as well. */
> OPAL_DECLSPEC int munmap(void* addr, size_t len)
> {
> return opal_memory_linux_free_ptmalloc2_munmap(addr, len, 0);
> }
> -
>
>
>
> On Nov 12, 2014, at 11:08
consider any mmap()/munmap()
rather unsafe to play with in an openmpi application.
E.
P.S: a last version of the test case is attached.
Le 11 nov. 2014 19:48, "Emmanuel Thomé" a écrit :
>
> Thanks a lot for your analysis. This seems consistent with what I can
> obtain by playing
; node 0 iteration 3, lead word received from peer is 0x1001 [ok]
>
> I don't know enough about memory hooks or the registration cache
> implementation to speak with any authority, but it looks like this is where
> the issue resides. As a workaround, can you try your original e
just gets zeroes).
I attach the simplified test case. I hope someone will be able to
reproduce the problem.
Best regards,
E.
On Mon, Nov 10, 2014 at 5:48 PM, Emmanuel Thomé
wrote:
> Thanks for your answer.
>
> On Mon, Nov 10, 2014 at 4:31 PM, Joshua Ladd wrote:
>> Just really qui
in reduce_scatter and allgather in the code.
Collectives are with communicators of 2 nodes, and we're talking (for
the smallest failing run) 8kb per node (i.e. 16kb total for an
allgather).
E.
> On Mon, Nov 10, 2014 at 9:29 AM, Emmanuel Thomé
> wrote:
>>
>> Hi,
>>
&g
Hi,
I'm stumbling on a problem related to the openib btl in
openmpi-1.[78].*, and the (I think legitimate) use of file-backed
mmaped areas for receiving data through MPI collective calls.
A test case is attached. I've tried to make is reasonably small,
although I recognize that it's not extra thi
20 matches
Mail list logo