> >> PS: Is there any way you can attach to the processes with gdb ? I > >> would like to see the backtrace as showed by gdb in order > to be able > >> to figure out what's wrong there. > >
I found out that all processes on the 2nd node crash so I just put a 30 second wait before MPI_Init in order to attach gdb and go from there. The code in cpi starts off as follows (in order to show where the SIGTERM below is coming from). MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); MPI_Get_processor_name(processor_name,&namelen); --- Attaching to process 11856 Reading symbols from /home/ggrobe/Projects/ompi/cpi/cpi...done. Using host libthread_db library "/lib/libthread_db.so.1". Reading symbols from /usr/local/openmpi-1.2b3r13030/lib/libmpi.so.0...done. Loaded symbols for /usr/local/openmpi-1.2b3r13030/lib/libmpi.so.0 Reading symbols from /usr/local/openmpi-1.2b3r13030/lib/libopen-rte.so.0...done. Loaded symbols for /usr/local/openmpi-1.2b3r13030/lib/libopen-rte.so.0 Reading symbols from /usr/local/openmpi-1.2b3r13030/lib/libopen-pal.so.0...done. Loaded symbols for /usr/local/openmpi-1.2b3r13030/lib/libopen-pal.so.0 Reading symbols from /lib64/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib64/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib64/libutil.so.1...done. Loaded symbols for /lib/libutil.so.1 Reading symbols from /lib64/libm.so.6...done. Loaded symbols for /lib/libm.so.6 Reading symbols from /lib64/libpthread.so.0...done. [Thread debugging using libthread_db enabled] [New Thread 46974166086512 (LWP 11856)] Loaded symbols for /lib/libpthread.so.0 Reading symbols from /lib64/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 0x00002ab90661e880 in nanosleep () from /lib/libc.so.6 (gdb) break MPI_Init Breakpoint 1 at 0x2ab905c0c880 (gdb) break MPI_Comm_size Breakpoint 2 at 0x2ab905c01af0 (gdb) continue Continuing. [Switching to Thread 46974166086512 (LWP 11856)] Breakpoint 1, 0x00002ab905c0c880 in PMPI_Init () from /usr/local/openmpi-1.2b3r13030/lib/libmpi.so.0 (gdb) n Single stepping until exit from function PMPI_Init, which has no line number information. [New Thread 1082132816 (LWP 11862)] Program received signal SIGTERM, Terminated. 0x00002ab906643f47 in ioctl () from /lib/libc.so.6 (gdb) backtrace #0 0x00002ab906643f47 in ioctl () from /lib/libc.so.6 Cannot access memory at address 0x7fffa50102f8 --- Does this help in anyway?