Additional note on the the BLACS vs. OpenMPI 1.1.1 & 1.3 problems:

The BLACS install program xtc_CsameF77 says to not use -DCsameF77 with OpenMPI; however, because of an oversight I used it in my first tests -- for OpenMPI 1.1.1 the errors are the same with and without this setting; however, without it the tester program is very slow with OpenMPI 1.1.1 or hangs at "RUNNING REPEATABLE SUM TEST" near the end. OpenMPI 1.1.2rc1 behaved nearly identically.

With regards to OpenMPI 1.3, not using -DCsameF77 (that is setting TRANSCOMM blank), prevents the crash I observed earlier; however, massive errors begin at the "DOUBLE COMPLEX AMX" tests and then the auxiliary tests at the end are very slow or hangs at "RUNNING REPEATABLE SUM TEST".

I don't know enough about the internals of OpenMPI to understand the following discussion or to understand if the install program xtc_CsameF77 works correctly with OpenMPI:

#  If you know that your MPI uses the same handles for fortran and C
#  communicators, you can replace the empty macro definition below with
#  the macro definition on the following line.
  TRANSCOMM = -DCSameF77


The complete details are below:

# If you know something about your system, you may make it easier for the
#  BLACS to translate between C and fortran communicators.  If the empty
#  macro defininition is left alone, this translation will cause the C
#  BLACS to globally block for MPI_COMM_WORLD on calls to BLACS_GRIDINIT
#  and BLACS_GRIDMAP.  If you choose one of the options for translating
#  the context, neither the C or fortran calls will globally block.
#  If you are using MPICH, or a derivitive system, you can replace the
#  empty macro definition below with the following (note that if you let
# MPICH do the translation between C and fortran, you must also indicate # here if your system has pointers that are longer than integers. If so,
#  define -DPOINTER_64_BITS=1.)  For help on setting TRANSCOMM, you can
#  run BLACS/INSTALL/xtc_CsameF77 and BLACS/INSTALL/xtc_UseMpich as
#  explained in BLACS/INSTALL/README.
#   TRANSCOMM = -DUseMpich
#
#  If you know that your MPI uses the same handles for fortran and C
#  communicators, you can replace the empty macro definition below with
#  the macro definition on the following line.
  TRANSCOMM = -DCSameF77
# -----------------------------------------------------------------------
#  TRANSCOMM =

Michael

ps. I have successfully tested MPICH2 1.0.4p1 with BLACS 1.1p3 on the same machine with same compilers.


On Oct 3, 2006, at 12:14 PM, Jeff Squyres wrote:

Thanks Michael -- I've updated ticket 356 with this info for v1.1, and
created ticket 464 for the trunk (v1.3) issue.

https://svn.open-mpi.org/trac/ompi/ticket/356
https://svn.open-mpi.org/trac/ompi/ticket/464

On 10/3/06 10:53 AM, "Michael Kluskens" <mk...@ieee.org> wrote:

Summary:

OpenMPI 1.1.1 and 1.3a1r11943 have different bugs with regards to
BLACS 1.1p3.

1.3 fails where 1.1.1 passes and vice-versus.

(1.1.1): Integer, real, double precision SDRV tests fail cases 1 &
51, then lots of errors until Integer SUM test then all tests pass.

(1.3): No errors until it crashes on the Complex AMX test (which is
after the Integer Sum test).

System configuration: Debian 3.1r3 on dual opteron, gcc 3.3.5, Intel
ifort 9.1.032.

On Oct 3, 2006, at 2:44 AM, Åke Sandgren wrote:

On Mon, 2006-10-02 at 18:39 -0400, Michael Kluskens wrote:
OpenMPI, BLACS, and blacstester built just fine.  Tester reports
errors for integer and real cases #1 and #51 and more for the other
types..

<http://svn.open-mpi.org/trac/ompi/ticket/356> is an open ticket
related to this.

Finally someone else with the same problem!!!

I tried the suggested fix from ticket 356 but it didn't help.
I still get lots of errors in the blacstest.

I'm running on a dual-cpu opteron with Ubuntu dapper and gcc-4.0.
The tests also failed on our i386 Ubuntu breezy system with gcc-3.4

More details of my two tests:
--------------------------------
OpenMPI 1.1.1
./configure --prefix=/opt/intel9.1/openmpi/1.1.1 F77=ifort FC=ifort --
with-mpi-f90-size=medium

BLACS 1.1 patch 3, Bmake.inc based on Bmake.MPI-LINUX with following
changes:

BTOPdir = /opt/intel9.1/openmpi/1.1.1/BLACS
BLACSDBGLVL = 1
MPIdir = /opt/intel9.1/openmpi/1.1.1
MPILIB =
INTFACE = -DAdd_
F77            = $(MPIdir)/bin/mpif77
CC             = $(MPIdir)/bin/mpicc
CCFLAGS        = -O3

--------------------------------
OpenMPI 1.3a1r11943
./configure --prefix=/opt/intel9.1/openmpi/1.3 F77=ifort FC=ifort --
with-mpi-f90-size=medium

similar changes for Bmake.inc in BLACS.

test launched in BLACS/TESTING/EXE using:

mpirun --prefix /opt/intel9.1/openmpi/1.3 -np 4 xCbtest_MPI-LINUX-1

No errors works much better but eventually failures with:

COMPLEX AMX TESTS: BEGIN.
Signal:11 info.si_errno:0(Success) si_code:128()
Failing at addr:(nil)
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0xb80000
[0] func:/opt/intel9.1/openmpi/1.3/lib/libopal.so.0
(opal_backtrace_print+0x1f) [0x2a95aa5c1f]
*** End of error message ***

Michael


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems



Reply via email to