On Wed, 31 May 2006 20:17:33 -0600, Brian Barrett <brbar...@open-mpi.org>
wrote:
Did you happen to have a chance to try to run the 1.0.3 or 1.1
nightly tarballs? I'm 50/50 on whether we've fixed these issues
already.
For Ticket #41:
Using Open MPI 1.0.3 and 1.1:
For some reason, I can't seem to get TCP to work with any number of nodes
1 (which is odd, because I've had it working on *this* system before;
MPICH works fine, so there's at least *something* right about the ethernet
config/hardware)
But I do get a different error with the snapshots vs. 1.0.2:
*****Open MPI 1.0.2*****
[root@zartan1 1.0.2]# mpirun -v -np 6 -prefix $MPIHOME -machinefile
machines -mca btl tcp,sm,self laten -o 10
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0x6
[0] func:/usr/x86_64-gcc-4.0.0/openmpi-1.0.2/lib/libopal.so.0
[0x2ab8333408ca]
[1] func:/lib64/libpthread.so.0 [0x2ab83394a380]
[2]
func:/usr/x86_64-gcc-4.0.0/openmpi-1.0.2/lib64/openmpi/mca_btl_tcp.so(mca_btl_tcp_proc_remove+0xbb)
[0x2ab8364299ab]
[3] func:/usr/x86_64-gcc-4.0.0/openmpi-1.0.2/lib64/openmpi/mca_btl_tcp.so
[0x2ab836427bec]
[4]
func:/usr/x86_64-gcc-4.0.0/openmpi-1.0.2/lib64/openmpi/mca_btl_tcp.so(mca_btl_tcp_add_procs+0x155)
[0x2ab836425445]
*** End of error message ***
[5]
func:/usr/x86_64-gcc-4.0.0/openmpi-1.0.2/lib64/openmpi/mca_bml_r2.so(mca_bml_r2_add_procs+0x26b)
[0x2ab835da72db]
[6]
func:/usr/x86_64-gcc-4.0.0/openmpi-1.0.2/lib64/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xcc)
[0x2ab835b8bd5c]
[7]
func:/usr/x86_64-gcc-4.0.0/openmpi-1.0.2/lib/libmpi.so.0(ompi_mpi_init+0x590)
[0x2ab8330b1c90]
[8]
func:/usr/x86_64-gcc-4.0.0/openmpi-1.0.2/lib/libmpi.so.0(MPI_Init+0x83)
[0x2ab83309d2d3]
[9] func:laten(main+0x6a) [0x4015f2]
[10] func:/lib64/libc.so.6(__libc_start_main+0xdc) [0x2ab833a6f4cc]
[11] func:laten [0x4014f9]
*****Open MPI 1.0.3*****
[root@zartan1 tmp]# mpirun -v -np 4 -prefix $MPIHOME -mca btl tcp,sm,self
-machinefile machines laten -o 10
MPI Bidirectional latency test (Send/Recv)
Processes Max Latency (us)
------------------------------------------
[0,1,3][btl_tcp_endpoint.c:559:mca_btl_tcp_endpoint_complete_connect]
connect() failed with errno=113
[0,1,2][btl_tcp_endpoint.c:559:mca_btl_tcp_endpoint_complete_connect]
connect() failed with errno=113
*****Open MPI 1.1*****
[root@zartan1 1.1]# mpirun -v -np 4 -prefix $MPIHOME -mca btl tcp
-machinefile machines laten -o 10
MPI Bidirectional latency test (Send/Recv)
Processes Max Latency (us)
------------------------------------------
[0,1,2][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect]
connect() failed with errno=113
[0,1,3][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect]
connect() failed with errno=113
If I use -np 2 (ie. the job doesn't leave the node, it being a dual-cpu
machine), it works fine.
--
Troy Telford