To close out this issue and eat some crow. It ended up the issue I saw
below was caused by a messed up version of BLACS and that using the
scalapack installer to retrieve BLACS and the options mentioned in the
Open MPI FAQ does produce a working BLACS. So there is no need for
changing WHATMPI.
Sorry for the misinformation,
--td
Terry Dontje wrote:
Ok, I think I found the cause of the SPARC segv when trying to use a
64-bit compiled Open MPI library. If one does not set the WHATMPI
variable in the Bmake.inc it defaults to UseF77Mpi which assumes all
handles are ints. This is a correct assumption if you are using the
F77 interfaces but the way BLACS seems to compile for Open MPI it uses
the C versions. So the handles are stored as 32 bits in BLACS and
passed to the C Open MPI interfaces which expects 64 bits. In cases
where your addresses need more than 32 bits this will cause MPI to
segv when passed an invalid address due to this coersion.
So by setting "WHATMPI= -DUseCMpi" I've gotten the SPARC version of
BLACS compiled for 64 bits to pass its tests without segv'ing. I do
believe this issue actually exists for other platforms (ie AMD64 and
IA64) with other OSes and compilers. Just that we've been lucky that
MPI_COMM_WORLD is allocated such that it has an address that fits in
32 bits. I am amazed still that we haven't seen this fail in user
codes. Note, I have not confirmed this failure with a test case but
the code stack in dbx looks the same on X64 platforms as the code on
SPARC except the address is smaller on the former.
Greg, I would be interested in knowing if you are still seeing the
problem on Leopard and whether the above setting helps any.
--td
*
*Subject:* Re: [OMPI users] ScaLapack and BLACS on Leopard
*From:* Terry Dontje (/Terry.Dontje_at_[hidden]/)
*Date:* 2008-03-03 07:34:17
*
What kind of system lib errors are you seeing and do you have a stack
trace? Note, I was trying something similar with Solaris and 64-bit on
a SPARC machine and was seeing segv's inside the MPI Library due to a
pointer being passed through an integer (thus dropping the upper 32
bits). Funny thing is it all works under Solaris on AMD64 or IA-64
platforms.
--td
> Date: Thu, 28 Feb 2008 17:50:28 -0500
> From: Gregory John Orris <gregory.orris_at_[hidden]>
> Subject: [OMPI users] ScaLapack and BLACS on Leopard
> To: Open MPI Users <users_at_[hidden]>
> Message-ID: <528FD4C0-6157-49CB-80E6-1C62684E4545_at_[hidden]>
> Content-Type: text/plain; charset="us-ascii"
>
> Hey Folks,
>
> Anyone got ScaLapack and BLACS working and not just compiled under
> OSX10.5 in 64-bit mode?
> The FAQ site directions were followed and every thing compiles just
> fine. But ALL of the single precision routines and many of the double
> precisions routines in the TESTING directory fail with system lib
> errors.
>
> I've gotten some interesting errors and am wondering what the magic
> touch is.
>
> Regards,
> Greg
>