Best options for debugging something like this are: -mca odls_base_verbose 5 -mca errmgr_base_verbose 5
It’ll generate a fair amount of output, so try to do it with a small job if you can. You’ll need a build configured with -enable-debug to get the output. > On Feb 18, 2016, at 8:29 PM, Ben Menadue <ben.mena...@nci.org.au> wrote: > > Hi, > > I'm investigating an issue with mpirun *sometimes* hanging after programs > call MPI_Abort... all of the MPI processes have terminated, however the > mpirun is still there. This happens with 1.8.8 and 1.10.2. There look to be > two threads, one in this path: > > #0 0x00007fa09c3143b3 in select () from /lib64/libc.so.6 > #1 0x00007fa09b001e2c in listen_thread (obj=0x7fa09b2109e8) at > ../../../../../../../../orte/mca/oob/tcp/oob_tcp_listener.c:685 > #2 0x00007fa09c5ceaa1 in start_thread () from /lib64/libpthread.so.0 > #3 0x00007fa09c31b93d in clone () from /lib64/libc.so.6 > > and the other in this: > > 0 0x00007fa09c312113 in poll () from /lib64/libc.so.6 > #1 0x00007fa09d318e7d in poll_dispatch (base=0x1568a80, tv=0x0) at > ../../../../../../../../../opal/mca/event/libevent2021/libevent/poll.c:165 > #2 0x00007fa09d30d96c in opal_libevent2021_event_base_loop (base=0x1568a80, > flags=1) at > ../../../../../../../../../opal/mca/event/libevent2021/libevent/event.c:1633 > #3 0x00000000004056fc in orterun (argc=2, argv=0x7ffe70248078) at > ../../../../../../../orte/tools/orterun/orterun.c:1142 > #4 0x0000000000403614 in main (argc=2, argv=0x7ffe70248078) at > ../../../../../../../orte/tools/orterun/main.c:13 > > But since this is in mpirun itself, I'm not sure how to delve deeper - is > there an MCA *_base_verbose parameter (or equivalent) that works on the > mpirun? > > Cheers, > Ben > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/02/28548.php