Re: [OMPI users] ScaLapack and BLACS on Leopard

Doug Reeder Thu, 6 Mar 2008 12:54:34 -0500

Greg,

I would disagree with your statement that the available fortranoptions can't pass a cost-benefit analysis. I have found that forscientific programming (e.g., Livermore Fortran Kernels and actualPDE solvers) that code produced by the intel compiler runs 25 to 55%faster than code from gfortran or g95. Looking at the cost of addingprocessors with g95/gfortran to get the same throughput as with ifortyou recover the $549 compiler cost real quickly.


Doug Reeder
On Mar 6, 2008, at 9:20 AM, Gregory John Orris wrote:

Sorry for the long delay in response.

Let's get back to the beginning:
My original compiler configuration was gcc from the standardLeopard Developer Tools supplied off the installation DVD. Thisversion was 4.0.1. However, it has been significantly modified byApple to work with Leopard. If you haven't used Apple's DeveloperEnvironment, you're missing out on something. It's pretty sweet.But the price you pay for it is no fortran support (not usually aproblem for me but it is relevant here) and usually a somewhat time-lagged compiler. I'm not as plugged into Apple as perhaps I shouldbe, but I can only imagine that their philosophy is to really overtest their compiler. Gratis, Apple throws into it's "frameworks" ashared library called vecLib, that includes machine optimized BLASand CLAPACK routines. Also, with Leopard, Apple has integrated open-mpi (yea!). But they have once again not included fortran support(boo!).
Now, to get fortran on a Mac you have several options (most ofwhich cannot really survive the cost-benefit analysis of acompetent manager), but a perfectly fine freeware option is to getit off of hpc.sourceforge.net. This version is based on gcc 4.3.0.There are a few legitimate reasons to stick with Apple's older gcc.As it's not really a good idea to try an mix libraries from onecompiler version with another. Especially here, because (withoutknowing precisely what Apple has done) there is a tremendousdifference in execution speed of code written with gcc 4.0 and 4.1as opposed to 4.2 and later. (This has been well documented on manysystems.) Also, out of a bit of laziness, I really didn't want togo to the trouble of re-writing (or finding) all of the compilerscripts in the Developer Environment to use the new gcc.
So, I compiled open-mpi-1.2.5 with gcc, g++ 4.0.1, and gfortran4.3. Then, I compiled BLACS and ScaLAPACK using the configurationfrom the open-mpi FAQ page. Everything compiles perfectly ok,independent of whether you choose 32 or 64 bit addressing. Firstproblem was that I was still calling mpicc from the Apple suppliedopenmpi and mpif77 from the newly installed distribution. Onceagain, I've not a clue what Apple has done, but while the two wouldcompile items together, they DO NOT COMMUNICATE properly in 64-bitmode. MPI_COMM_WORLD even in the test routines of openMPI wouldfail! This is the point at which I originated the message asking ifanyone had gotten a 64-bit version to actually work. The errorswere in libSystem and were not what I'd expect from a simpleopenmpi error. I believe this problem is caused by a difference inhow pointers were/are treated within gcc from version to version.Thus mixing versions essentially caused failure within the Applesupplied openmpi distribution and the new one I installed.
How to get over this hurdle? Install the complete gcc 4.3.0 fromthe hpc.sourceforge.net site and recompile EVERYTHING!
You might think you were done here, but there is one (or actuallyfour) additional problem(s). Now NONE of the complex routinesworked. All of the test routines returned failure. And I tracked itdown the the fact that pzdotc, pzdotu, pcdotc, and pcdotu inside ofthe PBLAS routines were failing. Potentially this was a much moredifficult problem, since rewriting these codes is really not whatI'm paid to do. Tracing down these errors further I found that theactual problem is with the zdotc, zdotu, cdotc, and cdotu BLASroutines inside of Apple's vecLib. So, the problem seemed as thougha faulty manufacturer supplied and optimized library was notfunctioning properly. Well, as it turns out there is a peculiardifference (again) between versions of the gcc suite in how itregards, returned values from complex fortran functions (I'm onlyassuming this since the workaround was successful). This problemhas been know for some time now (perhaps 4 years or more). See,http://developer.apple.com/hardware/ve/errata.html#fortran_conventions
How to get over this hurdle? Install ATLAS, CLAPACK, and CBLAS offthe netlib.org web site, and compile them with the gcc 4.3.0 suite.
So, where am I now? BLACS and ScaLAPACK, and PBLAS work in 64-bitmode with CLAPACK-3.1.1, ATLAS 3.8.1, Open-MPI-1.2.5, and GCC 4.3.0and link with ATLAS and CLAPACK and NOT vecLib!
Long way of saying that the problem appears to be solved, but notwell documented (until now)!
Regards,
Greg

On Mar 6, 2008, at 8:25 AM, Terry Dontje wrote:
Ok, I think I found the cause of the SPARC segv when trying to use a
64-bit compiled Open MPI library.  If one does not set the WHATMPI
variable in the Bmake.inc it defaults to UseF77Mpi which assumes all
handles are ints. This is a correct assumption if you are usingthe F77interfaces but the way BLACS seems to compile for Open MPI it usesthe Cversions. So the handles are stored as 32 bits in BLACS andpassed to
the C Open MPI interfaces which expects 64 bits.  In cases where your
addresses need more than 32 bits this will cause MPI to segv whenpassed
an invalid address due to this coersion.

So by setting "WHATMPI= -DUseCMpi" I've gotten the SPARC version of
BLACS compiled for 64 bits to pass its tests without segv'ing.  I do
believe this issue actually exists for other platforms (ie AMD64 and
IA64) with other OSes and compilers.  Just that we've been lucky that
MPI_COMM_WORLD is allocated such that it has an address that fitsin 32bits. I am amazed still that we haven't seen this fail in usercodes.
Note, I have not confirmed this failure with a test case but the code
stack in dbx looks the same on X64 platforms as the code on SPARCexcept
the address is smaller on the former.

Greg, I would be interested in knowing if you are still seeing the
problem on Leopard and whether the above setting helps any.

--td

*
*Subject:* Re: [OMPI users] ScaLapack and BLACS on Leopard
*From:* Terry Dontje (/Terry.Dontje_at_[hidden]/)
*Date:* 2008-03-03 07:34:17
*
What kind of system lib errors are you seeing and do you have astacktrace? Note, I was trying something similar with Solaris and 64-bit ona SPARC machine and was seeing segv's inside the MPI Library dueto a
pointer being passed through an integer (thus dropping the upper 32
bits). Funny thing is it all works under Solaris on AMD64 or IA-64
platforms.

--td
Date: Thu, 28 Feb 2008 17:50:28 -0500
From: Gregory John Orris <gregory.orris_at_[hidden]>
Subject: [OMPI users] ScaLapack and BLACS on Leopard
To: Open MPI Users <users_at_[hidden]>
Message-ID: <528FD4C0-6157-49CB-80E6-1C62684E4545_at_[hidden]>
Content-Type: text/plain; charset="us-ascii"

Hey Folks,

Anyone got ScaLapack and BLACS working and not just compiled under
OSX10.5 in 64-bit mode?
The FAQ site directions were followed and every thing compiles just
fine. But ALL of the single precision routines and many of thedouble
precisions routines in the TESTING directory fail with system lib
errors.

I've gotten some interesting errors and am wondering what the magic
touch is.

Regards,
Greg
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] ScaLapack and BLACS on Leopard

Reply via email to