That's actually failing in a shared memory section of the code.

But to answer your question, yes, Open MPI 1.2 did have IB support.

That being said, I have no idea what would cause this shared memory segv -- 
it's quite possible that it's simple bit rot (i.e., v1.2.9 was released 9 years 
ago -- see https://www.open-mpi.org/software/ompi/versions/timeline.php.  
Perhaps it does not function correctly on modern glibc/Linux kernel-based 
platforms).

Can you upgrade to a [much] newer Open MPI?



> On Mar 19, 2018, at 8:29 PM, Kaiming Ouyang <kouya...@ucr.edu> wrote:
> 
> Hi everyone,
> Recently I need to compile High-Performance Linpack code with openmpi 1.2 
> version (a little bit old). When I finish compilation, and try to run, I get 
> the following errors:
> 
> [test:32058] *** Process received signal ***
> [test:32058] Signal: Segmentation fault (11)
> [test:32058] Signal code: Address not mapped (1)
> [test:32058] Failing at address: 0x14a2b84b6304
> [test:32058] [ 0] /lib64/libpthread.so.0(+0xf5e0) [0x14eb116295e0]
> [test:32058] [ 1] 
> /root/research/lib/openmpi-1.2.9/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0x28a)
>  [0x14eaa81258aa]
> [test:32058] [ 2] 
> /root/research/lib/openmpi-1.2.9/lib/openmpi/mca_bml_r2.so(mca_bml_r2_progress+0x2b)
>  [0x14eaa853219b]
> [test:32058] [ 3] 
> /root/research/lib/openmpi-1.2.9/lib/libopen-pal.so.0(opal_progress+0x4a) 
> [0x14eb128dbaaa]
> [test:32058] [ 4] 
> /root/research/lib/openmpi-1.2.9/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_msg_wait+0x1d)
>  [0x14eaf41e6b4d]
> [test:32058] [ 5] 
> /root/research/lib/openmpi-1.2.9/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_recv+0x3a5)
>  [0x14eaf41eac45]
> [test:32058] [ 6] 
> /root/research/lib/openmpi-1.2.9/lib/libopen-rte.so.0(mca_oob_recv_packed+0x33)
>  [0x14eb12b62223]
> [test:32058] [ 7] 
> /root/research/lib/openmpi-1.2.9/lib/openmpi/mca_gpr_proxy.so(orte_gpr_proxy_put+0x1f9)
>  [0x14eaf3dd7db9]
> [test:32058] [ 8] 
> /root/research/lib/openmpi-1.2.9/lib/libopen-rte.so.0(orte_smr_base_set_proc_state+0x31d)
>  [0x14eb12b7893d]
> [test:32058] [ 9] 
> /root/research/lib/openmpi-1.2.9/lib/libmpi.so.0(ompi_mpi_init+0x8d6) 
> [0x14eb13202136]
> [test:32058] [10] 
> /root/research/lib/openmpi-1.2.9/lib/libmpi.so.0(MPI_Init+0x6a) 
> [0x14eb1322461a]
> [test:32058] [11] ./xhpl(main+0x5d) [0x404e7d]
> [test:32058] [12] /lib64/libc.so.6(__libc_start_main+0xf5) [0x14eb11278c05]
> [test:32058] [13] ./xhpl() [0x4056cb]
> [test:32058] *** End of error message ***
> mpirun noticed that job rank 0 with PID 31481 on node test.novalocal exited 
> on signal 15 (Terminated). 
> 23 additional processes aborted (not shown)
> 
> The machine has infiniband, so I doubt whether openmpi 1.2 does not support 
> infiniband by default. I also try to run it not through infiniband, but the 
> program can only deal with small size input. When I increase the input size 
> and grid size, it just gets stuck. The program I run is a benchmark, so I 
> don't think there would be a problem in the code. Any idea? Thanks.
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to