Jeff Squyres wrote:
One additional question: are you using TCP as your communications
network, and if so, do either of the nodes that you are running on
have more than one TCP NIC? We recently fixed a bug for situations
where at least one node in on multiple TCP networks, not all of which
were shared by the nodes where the peer MPI processes were running.
If this situation describes your network setup (e.g., a cluster where
the head node has a public and a private network, and where the
cluster nodes only have a private network -- and your MPI process was
running on the head node and a compute node), can you try upgrading
to the latest 1.0.2 release candidate tarball:
http://www.open-mpi.org/software/ompi/v1.0/
$ mpiexec -machinefile ../bhost -np 9 ./ng
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0x6
[0] func:/opt/openmpi/1.0.2a9/lib/libopal.so.0 [0x2aaaac062d0c]
[1] func:/lib64/tls/libpthread.so.0 [0x3b8d60c320]
[2]
func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_proc_remove+0xb5)
[0x2aaaae6e4c65]
[3] func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_btl_tcp.so [0x2aaaae6e2b09]
[4]
func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_add_procs+0x157)
[0x2aaaae6dfdd7]
[5]
func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_bml_r2.so(mca_bml_r2_add_procs+0x231)
[0x2aaaae3cd1e1]
[6]
func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0x94)
[0x2aaaae1b1f44]
[7] func:/opt/openmpi/1.0.2a9/lib/libmpi.so.0(ompi_mpi_init+0x3af)
[0x2aaaabdd2d7f]
[8] func:/opt/openmpi/1.0.2a9/lib/libmpi.so.0(MPI_Init+0x93)
[0x2aaaabdbeb33]
[9] func:/opt/openmpi/1.0.2a9/lib/libmpi.so.0(MPI_INIT+0x28)
[0x2aaaabdce948]
[10] func:./ng(MAIN__+0x38) [0x4022a8]
[11] func:./ng(main+0xe) [0x4126ce]
[12] func:/lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x3b8cb1c4bb]
[13] func:./ng [0x4021da]
*** End of error message ***
Bye,
Czarek