That's actually failing in a shared memory section of the code. But to answer your question, yes, Open MPI 1.2 did have IB support.
That being said, I have no idea what would cause this shared memory segv -- it's quite possible that it's simple bit rot (i.e., v1.2.9 was released 9 years ago -- see https://www.open-mpi.org/software/ompi/versions/timeline.php. Perhaps it does not function correctly on modern glibc/Linux kernel-based platforms). Can you upgrade to a [much] newer Open MPI? > On Mar 19, 2018, at 8:29 PM, Kaiming Ouyang <kouya...@ucr.edu> wrote: > > Hi everyone, > Recently I need to compile High-Performance Linpack code with openmpi 1.2 > version (a little bit old). When I finish compilation, and try to run, I get > the following errors: > > [test:32058] *** Process received signal *** > [test:32058] Signal: Segmentation fault (11) > [test:32058] Signal code: Address not mapped (1) > [test:32058] Failing at address: 0x14a2b84b6304 > [test:32058] [ 0] /lib64/libpthread.so.0(+0xf5e0) [0x14eb116295e0] > [test:32058] [ 1] > /root/research/lib/openmpi-1.2.9/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0x28a) > [0x14eaa81258aa] > [test:32058] [ 2] > /root/research/lib/openmpi-1.2.9/lib/openmpi/mca_bml_r2.so(mca_bml_r2_progress+0x2b) > [0x14eaa853219b] > [test:32058] [ 3] > /root/research/lib/openmpi-1.2.9/lib/libopen-pal.so.0(opal_progress+0x4a) > [0x14eb128dbaaa] > [test:32058] [ 4] > /root/research/lib/openmpi-1.2.9/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_msg_wait+0x1d) > [0x14eaf41e6b4d] > [test:32058] [ 5] > /root/research/lib/openmpi-1.2.9/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_recv+0x3a5) > [0x14eaf41eac45] > [test:32058] [ 6] > /root/research/lib/openmpi-1.2.9/lib/libopen-rte.so.0(mca_oob_recv_packed+0x33) > [0x14eb12b62223] > [test:32058] [ 7] > /root/research/lib/openmpi-1.2.9/lib/openmpi/mca_gpr_proxy.so(orte_gpr_proxy_put+0x1f9) > [0x14eaf3dd7db9] > [test:32058] [ 8] > /root/research/lib/openmpi-1.2.9/lib/libopen-rte.so.0(orte_smr_base_set_proc_state+0x31d) > [0x14eb12b7893d] > [test:32058] [ 9] > /root/research/lib/openmpi-1.2.9/lib/libmpi.so.0(ompi_mpi_init+0x8d6) > [0x14eb13202136] > [test:32058] [10] > /root/research/lib/openmpi-1.2.9/lib/libmpi.so.0(MPI_Init+0x6a) > [0x14eb1322461a] > [test:32058] [11] ./xhpl(main+0x5d) [0x404e7d] > [test:32058] [12] /lib64/libc.so.6(__libc_start_main+0xf5) [0x14eb11278c05] > [test:32058] [13] ./xhpl() [0x4056cb] > [test:32058] *** End of error message *** > mpirun noticed that job rank 0 with PID 31481 on node test.novalocal exited > on signal 15 (Terminated). > 23 additional processes aborted (not shown) > > The machine has infiniband, so I doubt whether openmpi 1.2 does not support > infiniband by default. I also try to run it not through infiniband, but the > program can only deal with small size input. When I increase the input size > and grid size, it just gets stuck. The program I run is a benchmark, so I > don't think there would be a problem in the code. Any idea? Thanks. > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users -- Jeff Squyres jsquy...@cisco.com _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users