Hi all,

I built 1.0.2 on Fedora 5 for x86_64 on a cluster setup as described below and I witness the same behavior when I try to run a job. Any ideas on the cause?

Jeff Squyres wrote:
> One additional question: are you using TCP as your communications
> network, and if so, do either of the nodes that you are running on
> have more than one TCP NIC? We recently fixed a bug for situations
> where at least one node in on multiple TCP networks, not all of which
> were shared by the nodes where the peer MPI processes were running.
> If this situation describes your network setup (e.g., a cluster where
> the head node has a public and a private network, and where the
> cluster nodes only have a private network -- and your MPI process was
> running on the head node and a compute node), can you try upgrading
> to the latest 1.0.2 release candidate tarball:
>
> http://www.open-mpi.org/software/ompi/v1.0/
>
>
$ mpiexec -machinefile ../bhost -np 9 ./ng
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0x6
[0] func:/opt/openmpi/1.0.2a9/lib/libopal.so.0 [0x2aaaac062d0c]
[1] func:/lib64/tls/libpthread.so.0 [0x3b8d60c320]
[2]
func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_proc_remove+0xb5)
[0x2aaaae6e4c65]
[3] func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_btl_tcp.so [0x2aaaae6e2b09]
[4]
func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_add_procs+0x157)
[0x2aaaae6dfdd7]
[5]
func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_bml_r2.so(mca_bml_r2_add_procs+0x231)
[0x2aaaae3cd1e1]
[6]
func:/opt/openmpi/1.0.2a9/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0x94)
[0x2aaaae1b1f44]
[7] func:/opt/openmpi/1.0.2a9/lib/libmpi.so.0(ompi_mpi_init+0x3af)
[0x2aaaabdd2d7f]
[8] func:/opt/openmpi/1.0.2a9/lib/libmpi.so.0(MPI_Init+0x93)
[0x2aaaabdbeb33]
[9] func:/opt/openmpi/1.0.2a9/lib/libmpi.so.0(MPI_INIT+0x28)
[0x2aaaabdce948]
[10] func:./ng(MAIN__+0x38) [0x4022a8]
[11] func:./ng(main+0xe) [0x4126ce]
[12] func:/lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x3b8cb1c4bb]
[13] func:./ng [0x4021da]
*** End of error message ***

Bye,
Czarek

Reply via email to