Hi,

So I have been looking at this on and off for some time now but I've
been unable to make much progress on it.

Whatever the bug is, it's causing heap corruption which causes
things to segfault before MPI_Init returns. While most of the time it
segfaults, sometimes it prints other openmpi errors, and sometimes
glibc aborts due to detecting heap corruption.

There is probably a threading data race issue here as well. If I run
the test program on an idle machine with "taskset 1" (so it only runs
on 1 CPU) then the errors go away. I expect this is the reason why the
specific error message is not 100% reproducible for me.

On Mon, 23 Jan 2017 23:20:23 +0800 YunQiang Su <[email protected]> wrote:
> It is quite strange that it won't fail if run with gdb.

I guess this is due to gdb slowing down certain threads.

Thanks,
James

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to