Ralph wrote:
> We found a locking error in vader - this has been fixed in the OMPI
master and will be in the 1.8.5 nightly tarball tomorrow.
I tested with the nightly tarball now. The deadlocks are fixed. Thanks! The
warning
[warn] opal_libevent2021_event_base_loop: reentrant invocation. Only one
I tried 1.8.5rc1 now. It behaves very similar to 1.8.4 from my point of
view (and completely different from 1.6.5). The warning
[warn] opal_libevent2021_event_base_loop: reentrant invocation. Only one
event_base_loop can run on each event_base at once.
is still there.
It's easy for me to (re)prod
Here is a stackdump from inside the debugger (because it gives filenames
and line numbers):
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f1eb6bfd700 (LWP 24847)]
0x00366aa79252 in _int_malloc () from /lib64/libc.so.6
(gdb) bt
#0 0x00366aa79252 in _int_mallo
The normal crash without crtl-Z can produce different stackdumps. With
ctrl-Z, the stackdump looks nearly always as follows: (In the debugger, I
get source files and line-numbers, so I guess it is built with debug-info)
[wam-r02c01b02:19183] *** Process received signal ***
[wam-r02c01b02:19183] S
> You might double-check by running with "--mca btl ^openib" to see if that
is the source of the warning
The warning appears always, independent of the interconnect, and even when
running with "--mca btl ^openib".
> Does it only crash when you pause it? Or does it crash while normally
running?
> 2. Unable to resolve: can you be more specific on this?
This was my mistake. I used "xxx.yyy.zzz" instead of "localhost" in the
startup options for orterun. (More precisely the GUI did it, but I knew
that code.) No idea how 1.6.5 managed to get around the fact that not even
"dig xxx.yyy.zzz" can
I have a nasty bug in my software and can make it crash by stopping it with
ctrl-Z, waiting many seconds, and then saying "fg", for continuing the run.
At least it crashes when I start it on 3 workers with 24 instances each
plus a master. On 2 workers with 24 instances each it doesn't crash.
I dec