This is resolved ;)  On our system, for releases after 1.3a1r18423
up to and including the latest release in the 1.4 trunk, configure
requires the --enable-mpi-threads option to be explicitly specified
for the cpi.c problem to successfully run, as shown here:

# ./configure --prefix=/opt/testing/openmpi/1.4a1r18770 \
   --enable-mpi-threads --with-gm=/opt/gm

# mpirun -np 4 -machinefile ~/bruhosts a.out
Process 1 of 4 is on bru27
Process 3 of 4 is on bru27
Process 0 of 4 is on bru25
Process 2 of 4 is on bru25
pi is approximately 3.1415926544231239, Error is 0.0000000008333307
wall clock time = 0.004372

Otherwise not doing so yields the segfault as shown before:

# ./configure --prefix=/opt/testing/openmpi/1.4a1r18770 --with-gm=/opt/gm

# mpirun -np 4 -machinefile ~/bruhosts a.out
Process 1 of 4 is on bru27
Process 3 of 4 is on bru27
Process 0 of 4 is on bru25
[bru25:30651] *** Process received signal ***
[bru25:30651] Signal: Segmentation fault (11)
[bru25:30651] Signal code: Address not mapped (1)
[bru25:30651] Failing at address: 0x9
Process 2 of 4 is on bru25
[bru25:30651] [ 0] /lib64/tls/libpthread.so.0 [0x2a95f7e420]
[bru25:30651] [ 1] /opt/sharcnet/testing/openmpi/1.3a1r18740/lib/openmpi/mca_btl_gm.so [0x2a97980fb9] [bru25:30651] [ 2] /opt/sharcnet/testing/openmpi/1.3a1r18740/lib/openmpi/mca_pml_ob1.so [0x2a97672c1d] [bru25:30651] [ 3] /opt/sharcnet/testing/openmpi/1.3a1r18740/lib/openmpi/mca_pml_ob1.so [0x2a97667753] [bru25:30651] [ 4] /opt/sharcnet/testing/openmpi/1.3a1r18740/lib/openmpi/mca_coll_tuned.so [0x2a9857db1c] [bru25:30651] [ 5] /opt/sharcnet/testing/openmpi/1.3a1r18740/lib/openmpi/mca_coll_tuned.so [0x2a9857de27] [bru25:30651] [ 6] /opt/sharcnet/testing/openmpi/1.3a1r18740/lib/openmpi/mca_coll_tuned.so [0x2a98573eec] [bru25:30651] [ 7] /opt/sharcnet/testing/openmpi/current/lib/libmpi.so.0(PMPI_Bcast+0x13e) [0x2a956b405e]
[bru25:30651] [ 8] a.out(main+0xd6) [0x400d0f]
[bru25:30651] [ 9] /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x2a960a34bb]
[bru25:30651] [10] a.out [0x400b7a]
[bru25:30651] *** End of error message ***
[bru34:06039] -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 30651 on node bru25 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------


On Fri, 27 Jun 2008, Doug Roberts wrote:


Hi, I am trying to use the latest release of v1.3 to test with BLCR
however i just noticed that sometime after 1.3a1r18423 the standard
mpich sample code (cpi.c) stopped working on our rel4 based myrinet
gm clusters which raises some concern.

Please find attached: gm_board_info.out, ompi_info--all.out,
ompi_info--param-btl-gm.out and config-1.4a1r18743.log bundled
in mpi-output.tar.gz for your analysis.

Below shows the sample code runs with 1.3a1r18423, but crashes with
1.3a1r18740 and further crashes with all snapshots greater than
1.3a1r18423 i have tested.

Reply via email to