Gus Correa wrote:
Hi Dorian and Eloi

I wonder if this is really a Goto BLAS problem or related to
how OpenMPI was configured.

In a recent sequence of postings on this list
a colleague reported several errors which were fixed
after he removed the (non-default) "--enable-mpi-threads"
flag from his OpenMPI configuration (and built OpenMPI again,
and recompiled).

See this thread:
http://www.open-mpi.org/community/lists/users/2009/12/11640.php
http://www.open-mpi.org/community/lists/users/2010/01/11695.php

He was also using BLAS (most likely Goto's) in the HPL benchmark.

Did you configure OpenMPI with "--enable-mpi-threads"?
Have you tried without it?

I hope this helps.
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------


Eloi Gaudry wrote:
Dorian Krause wrote:
Hi Eloi,

Does the segmentation faults you're facing also happen in a sequential environment (i.e. not linked against openmpi libraries) ?

No, without MPI everything works fine. Also, linking against mvapich doesn't give any errors. I think there is a problem with GotoBLAS and the shared library infrastructure of OpenMPI. The code doesn't come to the point to execute the gemm operation at all.

Have you already informed Kazushige Goto (developer of Gotoblas) ?

Not yet. Since the problem only happens with openmpi and the BLAS (stand-alone) seems to work, I thought the openmpi mailing list would be the better place to discuss this (to get a grasp of what the error could be before going to the GotoBLAS mailing list).


Regards,
Eloi

PS: Could you post your Makefile.rule here so that we could check the different compilation options chosen ?

I didn't make any changes to the Makefile.rules. This is the content of Makefile.conf:

OSNAME=Linux
ARCH=x86_64
C_COMPILER=GCC
BINARY32=
BINARY64=1
CEXTRALIB=-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -lc
F_COMPILER=GFORTRAN
FC=gfortran
BU=_
FEXTRALIB=-L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -lgfortran -lm -lgfortran -lm -lc
CORE=BARCELONA
LIBCORE=barcelona
NUM_CORES=8
HAVE_MMX=1
HAVE_SSE=1
HAVE_SSE2=1
HAVE_SSE3=1
HAVE_SSE4A=1
HAVE_3DNOWEX=1
HAVE_3DNOW=1
MAKE += -j 8
SGEMM_UNROLL_M=8
SGEMM_UNROLL_N=4
DGEMM_UNROLL_M=4
DGEMM_UNROLL_N=4
QGEMM_UNROLL_M=2
QGEMM_UNROLL_N=2
CGEMM_UNROLL_M=4
CGEMM_UNROLL_N=2
ZGEMM_UNROLL_M=2
ZGEMM_UNROLL_N=2
XGEMM_UNROLL_M=1
XGEMM_UNROLL_N=1


Thanks,
Dorian

Dorian,

I've been experiencing similar issue on two different Opteron architectures (22xx and 25x), in a sequential environment, when using v2-1.10 of GotoBLAS. If you can downgrade to version 2-1.09, I bet you will not experience such issues. Anyway, I'm pretty sure Kazushige is working on fixing this right now.

Eloi
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--


Eloi Gaudry

Free Field Technologies
Axis Park Louvain-la-Neuve
Rue Emile Francqui, 1
B-1435 Mont-Saint Guibert
BELGIUM

Company Phone: +32 10 487 959
Company Fax:   +32 10 454 626

Reply via email to