On Wed, 31 May 2006 20:17:33 -0600, Brian Barrett <brbar...@open-mpi.org> wrote:

Did you happen to have a chance to try to run the 1.0.3 or 1.1
nightly tarballs?  I'm 50/50 on whether we've fixed these issues
already.

For Ticket #41:

Using Open MPI 1.0.3 and 1.1:
For some reason, I can't seem to get TCP to work with any number of nodes
1 (which is odd, because I've had it working on *this* system before;
MPICH works fine, so there's at least *something* right about the ethernet config/hardware)

But I do get a different error with the snapshots vs. 1.0.2:

*****Open MPI 1.0.2*****
[root@zartan1 1.0.2]# mpirun -v -np 6 -prefix $MPIHOME -machinefile machines -mca btl tcp,sm,self laten -o 10
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0x6
[0] func:/usr/x86_64-gcc-4.0.0/openmpi-1.0.2/lib/libopal.so.0 [0x2ab8333408ca]
[1] func:/lib64/libpthread.so.0 [0x2ab83394a380]
[2] func:/usr/x86_64-gcc-4.0.0/openmpi-1.0.2/lib64/openmpi/mca_btl_tcp.so(mca_btl_tcp_proc_remove+0xbb) [0x2ab8364299ab] [3] func:/usr/x86_64-gcc-4.0.0/openmpi-1.0.2/lib64/openmpi/mca_btl_tcp.so [0x2ab836427bec] [4] func:/usr/x86_64-gcc-4.0.0/openmpi-1.0.2/lib64/openmpi/mca_btl_tcp.so(mca_btl_tcp_add_procs+0x155) [0x2ab836425445]
*** End of error message ***
[5] func:/usr/x86_64-gcc-4.0.0/openmpi-1.0.2/lib64/openmpi/mca_bml_r2.so(mca_bml_r2_add_procs+0x26b) [0x2ab835da72db] [6] func:/usr/x86_64-gcc-4.0.0/openmpi-1.0.2/lib64/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xcc) [0x2ab835b8bd5c] [7] func:/usr/x86_64-gcc-4.0.0/openmpi-1.0.2/lib/libmpi.so.0(ompi_mpi_init+0x590) [0x2ab8330b1c90] [8] func:/usr/x86_64-gcc-4.0.0/openmpi-1.0.2/lib/libmpi.so.0(MPI_Init+0x83) [0x2ab83309d2d3]
[9] func:laten(main+0x6a) [0x4015f2]
[10] func:/lib64/libc.so.6(__libc_start_main+0xdc) [0x2ab833a6f4cc]
[11] func:laten [0x4014f9]

*****Open MPI 1.0.3*****
[root@zartan1 tmp]# mpirun -v -np 4 -prefix $MPIHOME -mca btl tcp,sm,self -machinefile machines laten -o 10
MPI Bidirectional latency test (Send/Recv)
             Processes    Max Latency (us)
------------------------------------------
[0,1,3][btl_tcp_endpoint.c:559:mca_btl_tcp_endpoint_complete_connect] connect() failed with errno=113 [0,1,2][btl_tcp_endpoint.c:559:mca_btl_tcp_endpoint_complete_connect] connect() failed with errno=113

*****Open MPI 1.1*****
[root@zartan1 1.1]# mpirun -v -np 4 -prefix $MPIHOME -mca btl tcp -machinefile machines laten -o 10
MPI Bidirectional latency test (Send/Recv)
             Processes    Max Latency (us)
------------------------------------------
[0,1,2][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] connect() failed with errno=113 [0,1,3][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] connect() failed with errno=113

If I use -np 2 (ie. the job doesn't leave the node, it being a dual-cpu machine), it works fine.
--
Troy Telford

Reply via email to