Yaakoub El Khamra wrote:
Greetings
Can we please verify this problem is with Gotoblas and not with
OpenMPI? if I read this correctly, without MPI and with other flavors
of MPI, you have normal execution. This would normally indicate the
problem is on the OpenMPI side.

I am 2 doors away from Kazushige's office. Please do let me know so
that I can talk to him about this.

Regards
Yaakoub El Khamra




On Tue, Jan 19, 2010 at 9:35 AM, Gus Correa <g...@ldeo.columbia.edu> wrote:
Hi Dorian and Eloi

I wonder if this is really a Goto BLAS problem or related to
how OpenMPI was configured.

In a recent sequence of postings on this list
a colleague reported several errors which were fixed
after he removed the (non-default) "--enable-mpi-threads"
flag from his OpenMPI configuration (and built OpenMPI again,
and recompiled).

See this thread:
http://www.open-mpi.org/community/lists/users/2009/12/11640.php
http://www.open-mpi.org/community/lists/users/2010/01/11695.php

He was also using BLAS (most likely Goto's) in the HPL benchmark.

Did you configure OpenMPI with "--enable-mpi-threads"?
Have you tried without it?

I hope this helps.
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------


Eloi Gaudry wrote:
Dorian Krause wrote:
Hi Eloi,
Does the segmentation faults you're facing also happen in a
sequential environment (i.e. not linked against openmpi libraries) ?
No, without MPI everything works fine. Also, linking against mvapich
doesn't give any errors. I think there is a problem with GotoBLAS and
the shared library infrastructure of OpenMPI. The code doesn't come to
the point to execute the gemm operation at all.

Have you already informed Kazushige Goto (developer of Gotoblas) ?
Not yet. Since the problem only happens with openmpi and the BLAS
(stand-alone) seems to work, I thought the openmpi mailing list would
be the better place to discuss this (to get a grasp of what the error
could be before going to the GotoBLAS mailing list).

Regards,
Eloi

PS: Could you post your Makefile.rule here so that we could check the
different compilation options chosen ?
I didn't make any changes to the Makefile.rules. This is the content
of Makefile.conf:

OSNAME=Linux
ARCH=x86_64
C_COMPILER=GCC
BINARY32=
BINARY64=1
CEXTRALIB=-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2
-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2
-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64
-L/lib/../lib64 -L/usr/lib/../lib64  -lc
F_COMPILER=GFORTRAN
FC=gfortran
BU=_
FEXTRALIB=-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2
-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2
-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64
-L/lib/../lib64 -L/usr/lib/../lib64  -lgfortran -lm -lgfortran -lm -lc
CORE=BARCELONA
LIBCORE=barcelona
NUM_CORES=8
HAVE_MMX=1
HAVE_SSE=1
HAVE_SSE2=1
HAVE_SSE3=1
HAVE_SSE4A=1
HAVE_3DNOWEX=1
HAVE_3DNOW=1
MAKE += -j 8
SGEMM_UNROLL_M=8
SGEMM_UNROLL_N=4
DGEMM_UNROLL_M=4
DGEMM_UNROLL_N=4
QGEMM_UNROLL_M=2
QGEMM_UNROLL_N=2
CGEMM_UNROLL_M=4
CGEMM_UNROLL_N=2
ZGEMM_UNROLL_M=2
ZGEMM_UNROLL_N=2
XGEMM_UNROLL_M=1
XGEMM_UNROLL_N=1


Thanks,
Dorian

Dorian,

I've been experiencing similar issue on two different Opteron
architectures (22xx and 25x), in a sequential environment, when using
v2-1.10 of GotoBLAS. If you can downgrade to version 2-1.09, I bet you
will not experience such issues. Anyway, I'm pretty sure Kazushige is
working on fixing this right now.

Eloi


Hi Gus and Yaakoub,

I've been able to reproduce similar issue on opteron servers, either using a sequential or parallel binary linked with v2-1.10. With v2-1.09, these segfaults disappear. I've just told Kazushige so.

I don't think that the segmentation faults experienced by Dorian are due to OpenMPI (I'm myself using a non-mpi-thread-aware built).

Regards,
Eloi


Reply via email to