Best options for debugging something like this are: -mca odls_base_verbose 5 
-mca errmgr_base_verbose 5

It’ll generate a fair amount of output, so try to do it with a small job if you 
can. You’ll need a build configured with -enable-debug to get the output.


> On Feb 18, 2016, at 8:29 PM, Ben Menadue <ben.mena...@nci.org.au> wrote:
> 
> Hi,
> 
> I'm investigating an issue with mpirun *sometimes* hanging after programs
> call MPI_Abort... all of the MPI processes have terminated, however the
> mpirun is still there. This happens with 1.8.8 and 1.10.2. There look to be
> two threads, one in this path:
> 
> #0  0x00007fa09c3143b3 in select () from /lib64/libc.so.6
> #1  0x00007fa09b001e2c in listen_thread (obj=0x7fa09b2109e8) at
> ../../../../../../../../orte/mca/oob/tcp/oob_tcp_listener.c:685
> #2  0x00007fa09c5ceaa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x00007fa09c31b93d in clone () from /lib64/libc.so.6
> 
> and the other in this:
> 
> 0  0x00007fa09c312113 in poll () from /lib64/libc.so.6
> #1  0x00007fa09d318e7d in poll_dispatch (base=0x1568a80, tv=0x0) at
> ../../../../../../../../../opal/mca/event/libevent2021/libevent/poll.c:165
> #2  0x00007fa09d30d96c in opal_libevent2021_event_base_loop (base=0x1568a80,
> flags=1) at
> ../../../../../../../../../opal/mca/event/libevent2021/libevent/event.c:1633
> #3  0x00000000004056fc in orterun (argc=2, argv=0x7ffe70248078) at
> ../../../../../../../orte/tools/orterun/orterun.c:1142
> #4  0x0000000000403614 in main (argc=2, argv=0x7ffe70248078) at
> ../../../../../../../orte/tools/orterun/main.c:13
> 
> But since this is in mpirun itself, I'm not sure how to delve deeper - is
> there an MCA *_base_verbose parameter (or equivalent) that works on the
> mpirun?
> 
> Cheers,
> Ben
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/02/28548.php

Reply via email to