Summary:

OpenMPI 1.1.1 and 1.3a1r11943 have different bugs with regards to BLACS 1.1p3.

1.3 fails where 1.1.1 passes and vice-versus.

(1.1.1): Integer, real, double precision SDRV tests fail cases 1 & 51, then lots of errors until Integer SUM test then all tests pass.

(1.3): No errors until it crashes on the Complex AMX test (which is after the Integer Sum test).

System configuration: Debian 3.1r3 on dual opteron, gcc 3.3.5, Intel ifort 9.1.032.

On Oct 3, 2006, at 2:44 AM, Åke Sandgren wrote:

On Mon, 2006-10-02 at 18:39 -0400, Michael Kluskens wrote:
OpenMPI, BLACS, and blacstester built just fine.  Tester reports
errors for integer and real cases #1 and #51 and more for the other
types..

<http://svn.open-mpi.org/trac/ompi/ticket/356> is an open ticket
related to this.

Finally someone else with the same problem!!!

I tried the suggested fix from ticket 356 but it didn't help.
I still get lots of errors in the blacstest.

I'm running on a dual-cpu opteron with Ubuntu dapper and gcc-4.0.
The tests also failed on our i386 Ubuntu breezy system with gcc-3.4

More details of my two tests:
--------------------------------
OpenMPI 1.1.1
./configure --prefix=/opt/intel9.1/openmpi/1.1.1 F77=ifort FC=ifort -- with-mpi-f90-size=medium

BLACS 1.1 patch 3, Bmake.inc based on Bmake.MPI-LINUX with following changes:

BTOPdir = /opt/intel9.1/openmpi/1.1.1/BLACS
BLACSDBGLVL = 1
MPIdir = /opt/intel9.1/openmpi/1.1.1
MPILIB =
INTFACE = -DAdd_
F77            = $(MPIdir)/bin/mpif77
CC             = $(MPIdir)/bin/mpicc
CCFLAGS        = -O3

--------------------------------
OpenMPI 1.3a1r11943
./configure --prefix=/opt/intel9.1/openmpi/1.3 F77=ifort FC=ifort -- with-mpi-f90-size=medium

similar changes for Bmake.inc in BLACS.

test launched in BLACS/TESTING/EXE using:

mpirun --prefix /opt/intel9.1/openmpi/1.3 -np 4 xCbtest_MPI-LINUX-1

No errors works much better but eventually failures with:

COMPLEX AMX TESTS: BEGIN.
Signal:11 info.si_errno:0(Success) si_code:128()
Failing at addr:(nil)
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0xb80000
[0] func:/opt/intel9.1/openmpi/1.3/lib/libopal.so.0 (opal_backtrace_print+0x1f) [0x2a95aa5c1f]
*** End of error message ***

Michael


Reply via email to