Hi Dorian and Eloi
I wonder if this is really a Goto BLAS problem or related to
how OpenMPI was configured.
In a recent sequence of postings on this list
a colleague reported several errors which were fixed
after he removed the (non-default) "--enable-mpi-threads"
flag from his OpenMPI configuration (and built OpenMPI again,
and recompiled).
See this thread:
http://www.open-mpi.org/community/lists/users/2009/12/11640.php
http://www.open-mpi.org/community/lists/users/2010/01/11695.php
He was also using BLAS (most likely Goto's) in the HPL benchmark.
Did you configure OpenMPI with "--enable-mpi-threads"?
Have you tried without it?
I hope this helps.
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------
Eloi Gaudry wrote:
Dorian Krause wrote:
Hi Eloi,
Does the segmentation faults you're facing also happen in a
sequential environment (i.e. not linked against openmpi libraries) ?
No, without MPI everything works fine. Also, linking against mvapich
doesn't give any errors. I think there is a problem with GotoBLAS and
the shared library infrastructure of OpenMPI. The code doesn't come to
the point to execute the gemm operation at all.
Have you already informed Kazushige Goto (developer of Gotoblas) ?
Not yet. Since the problem only happens with openmpi and the BLAS
(stand-alone) seems to work, I thought the openmpi mailing list would
be the better place to discuss this (to get a grasp of what the error
could be before going to the GotoBLAS mailing list).
Regards,
Eloi
PS: Could you post your Makefile.rule here so that we could check the
different compilation options chosen ?
I didn't make any changes to the Makefile.rules. This is the content
of Makefile.conf:
OSNAME=Linux
ARCH=x86_64
C_COMPILER=GCC
BINARY32=
BINARY64=1
CEXTRALIB=-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2
-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2
-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64
-L/lib/../lib64 -L/usr/lib/../lib64 -lc
F_COMPILER=GFORTRAN
FC=gfortran
BU=_
FEXTRALIB=-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2
-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2
-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64
-L/lib/../lib64 -L/usr/lib/../lib64 -lgfortran -lm -lgfortran -lm -lc
CORE=BARCELONA
LIBCORE=barcelona
NUM_CORES=8
HAVE_MMX=1
HAVE_SSE=1
HAVE_SSE2=1
HAVE_SSE3=1
HAVE_SSE4A=1
HAVE_3DNOWEX=1
HAVE_3DNOW=1
MAKE += -j 8
SGEMM_UNROLL_M=8
SGEMM_UNROLL_N=4
DGEMM_UNROLL_M=4
DGEMM_UNROLL_N=4
QGEMM_UNROLL_M=2
QGEMM_UNROLL_N=2
CGEMM_UNROLL_M=4
CGEMM_UNROLL_N=2
ZGEMM_UNROLL_M=2
ZGEMM_UNROLL_N=2
XGEMM_UNROLL_M=1
XGEMM_UNROLL_N=1
Thanks,
Dorian
Dorian,
I've been experiencing similar issue on two different Opteron
architectures (22xx and 25x), in a sequential environment, when using
v2-1.10 of GotoBLAS. If you can downgrade to version 2-1.09, I bet you
will not experience such issues. Anyway, I'm pretty sure Kazushige is
working on fixing this right now.
Eloi
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users