Summary:
OpenMPI 1.1.1 and 1.3a1r11943 have different bugs with regards to
BLACS 1.1p3.
1.3 fails where 1.1.1 passes and vice-versus.
(1.1.1): Integer, real, double precision SDRV tests fail cases 1 &
51, then lots of errors until Integer SUM test then all tests pass.
(1.3): No errors until it crashes on the Complex AMX test (which is
after the Integer Sum test).
System configuration: Debian 3.1r3 on dual opteron, gcc 3.3.5, Intel
ifort 9.1.032.
On Oct 3, 2006, at 2:44 AM, Åke Sandgren wrote:
On Mon, 2006-10-02 at 18:39 -0400, Michael Kluskens wrote:
OpenMPI, BLACS, and blacstester built just fine. Tester reports
errors for integer and real cases #1 and #51 and more for the other
types..
<http://svn.open-mpi.org/trac/ompi/ticket/356> is an open ticket
related to this.
Finally someone else with the same problem!!!
I tried the suggested fix from ticket 356 but it didn't help.
I still get lots of errors in the blacstest.
I'm running on a dual-cpu opteron with Ubuntu dapper and gcc-4.0.
The tests also failed on our i386 Ubuntu breezy system with gcc-3.4
More details of my two tests:
--------------------------------
OpenMPI 1.1.1
./configure --prefix=/opt/intel9.1/openmpi/1.1.1 F77=ifort FC=ifort --
with-mpi-f90-size=medium
BLACS 1.1 patch 3, Bmake.inc based on Bmake.MPI-LINUX with following
changes:
BTOPdir = /opt/intel9.1/openmpi/1.1.1/BLACS
BLACSDBGLVL = 1
MPIdir = /opt/intel9.1/openmpi/1.1.1
MPILIB =
INTFACE = -DAdd_
F77 = $(MPIdir)/bin/mpif77
CC = $(MPIdir)/bin/mpicc
CCFLAGS = -O3
--------------------------------
OpenMPI 1.3a1r11943
./configure --prefix=/opt/intel9.1/openmpi/1.3 F77=ifort FC=ifort --
with-mpi-f90-size=medium
similar changes for Bmake.inc in BLACS.
test launched in BLACS/TESTING/EXE using:
mpirun --prefix /opt/intel9.1/openmpi/1.3 -np 4 xCbtest_MPI-LINUX-1
No errors works much better but eventually failures with:
COMPLEX AMX TESTS: BEGIN.
Signal:11 info.si_errno:0(Success) si_code:128()
Failing at addr:(nil)
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0xb80000
[0] func:/opt/intel9.1/openmpi/1.3/lib/libopal.so.0
(opal_backtrace_print+0x1f) [0x2a95aa5c1f]
*** End of error message ***
Michael