Jeff Squyres wrote:
One additional question: are you using TCP as your communications network, and if so, do either of the nodes that you are running on have more than one TCP NIC? We recently fixed a bug for situations where at least one node in on multiple TCP networks, not all of which were shared by the nodes where the peer MPI processes were running. If this situation describes your network setup (e.g., a cluster where the head node has a public and a private network, and where the cluster nodes only have a private network -- and your MPI process was running on the head node and a compute node), can you try upgrading to the latest 1.0.2 release candidate tarball:

     http://www.open-mpi.org/software/ompi/v1.0/

$ mpiexec -machinefile ../bhost -np 9 ./ng
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0x6
[0] func:/opt/openmpi/1.0.2a9/lib/libopal.so.0 [0x2aaaac062d0c]
[1] func:/lib64/tls/libpthread.so.0 [0x3b8d60c320]
[2] func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_proc_remove+0xb5) [0x2aaaae6e4c65]
[3] func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_btl_tcp.so [0x2aaaae6e2b09]
[4] func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_add_procs+0x157) [0x2aaaae6dfdd7] [5] func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_bml_r2.so(mca_bml_r2_add_procs+0x231) [0x2aaaae3cd1e1] [6] func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0x94) [0x2aaaae1b1f44] [7] func:/opt/openmpi/1.0.2a9/lib/libmpi.so.0(ompi_mpi_init+0x3af) [0x2aaaabdd2d7f] [8] func:/opt/openmpi/1.0.2a9/lib/libmpi.so.0(MPI_Init+0x93) [0x2aaaabdbeb33] [9] func:/opt/openmpi/1.0.2a9/lib/libmpi.so.0(MPI_INIT+0x28) [0x2aaaabdce948]
[10] func:./ng(MAIN__+0x38) [0x4022a8]
[11] func:./ng(main+0xe) [0x4126ce]
[12] func:/lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x3b8cb1c4bb]
[13] func:./ng [0x4021da]
*** End of error message ***

Bye,
Czarek


Reply via email to