> I need it's the backtrace on the process which generate the 
> segfault. Second, in order to understand the backtrace, it's 
> better to have run debug version of Open MPI. Without the 
> debug version we only see the address where the fault occur 
> without having access to the line number ...

How about this, this is the section that I was stepping through in order
to get the first error I usually run into ... "mx_connect fail for
node-1:0 with key aaaaffff (error Endpoint closed or not connectable!)"

// gdb output

Breakpoint 1, 0x00002ac856bd92e0 in opal_progress ()
   from /usr/local/openmpi-1.2b3r13030/lib/libopen-pal.so.0
(gdb) s
Single stepping until exit from function opal_progress, 
which has no line number information.
0x00002ac857361540 in sched_yield () from /lib/libc.so.6
(gdb) s
Single stepping until exit from function sched_yield, 
which has no line number information.
opal_condition_wait (c=0x5098e0, m=0x5098a0) at condition.h:80
80              while (c->c_signaled == 0) {
(gdb) s
81                  opal_progress();
(gdb) s

Breakpoint 1, 0x00002ac856bd92e0 in opal_progress ()
   from /usr/local/openmpi-1.2b3r13030/lib/libopen-pal.so.0
(gdb) s
Single stepping until exit from function opal_progress, 
which has no line number information.
0x00002ac857361540 in sched_yield () from /lib/libc.so.6
(gdb) backtrace
#0  0x00002ac857361540 in sched_yield () from /lib/libc.so.6
#1  0x0000000000402f60 in opal_condition_wait (c=0x5098e0, m=0x5098a0)
    at condition.h:81
#2  0x0000000000402b3c in orterun (argc=17, argv=0x7fff54151088)
    at orterun.c:427
#3  0x0000000000402713 in main (argc=17, argv=0x7fff54151088) at
main.c:13

--- This is the mpirun output as I was stepping through it. At the end
of this is the error that the backtrace above shows.

[node-2:11909] top: openmpi-sessions-ggrobe@node-2_0
[node-2:11909] tmp: /tmp
[node-1:10719] procdir:
/tmp/openmpi-sessions-ggrobe@node-1_0/default-universe-17414/1/0
[node-1:10719] jobdir:
/tmp/openmpi-sessions-ggrobe@node-1_0/default-universe-17414/1
[node-1:10719] unidir:
/tmp/openmpi-sessions-ggrobe@node-1_0/default-universe-17414
[node-1:10719] top: openmpi-sessions-ggrobe@node-1_0
[node-1:10719] tmp: /tmp
[juggernaut:17414] spawn: in job_state_callback(jobid = 1, state = 0x4)
[juggernaut:17414] Info: Setting up debugger process table for
applications
  MPIR_being_debugged = 0
  MPIR_debug_gate = 0
  MPIR_debug_state = 1
  MPIR_acquired_pre_main = 0
  MPIR_i_am_starter = 0
  MPIR_proctable_size = 6
  MPIR_proctable:
    (i, host, exe, pid) = (0, node-1,
/home/ggrobe/Projects/ompi/cpi/./cpi, 10719)
    (i, host, exe, pid) = (1, node-1,
/home/ggrobe/Projects/ompi/cpi/./cpi, 10720)
    (i, host, exe, pid) = (2, node-1,
/home/ggrobe/Projects/ompi/cpi/./cpi, 10721)
    (i, host, exe, pid) = (3, node-1,
/home/ggrobe/Projects/ompi/cpi/./cpi, 10722)
    (i, host, exe, pid) = (4, node-2,
/home/ggrobe/Projects/ompi/cpi/./cpi, 11908)
    (i, host, exe, pid) = (5, node-2,
/home/ggrobe/Projects/ompi/cpi/./cpi, 11909)
[node-1:10718] sess_dir_finalize: proc session dir not empty - leaving
[node-1:10718] sess_dir_finalize: proc session dir not empty - leaving
[node-1:10721] procdir:
/tmp/openmpi-sessions-ggrobe@node-1_0/default-universe-17414/1/2
[node-1:10721] jobdir:
/tmp/openmpi-sessions-ggrobe@node-1_0/default-universe-17414/1
[node-1:10721] unidir:
/tmp/openmpi-sessions-ggrobe@node-1_0/default-universe-17414
[node-1:10721] top: openmpi-sessions-ggrobe@node-1_0
[node-1:10721] tmp: /tmp
[node-1:10720] mx_connect fail for node-1:0 with key aaaaffff (error
Endpoint closed or not connectable!)

Reply via email to