Sorry for the interruption. I back on mpi tracks again.
I have rebuilt openmpi-1.0.2a9 with -g and the error is unchanged.
I have also discovered that I don't need to run any openmpi application
to show up the error.
mpirun --help or mpirun show up the same error:
valiron@icare ~ > mpirun
*Segmentation fault (core dumped)
and
valiron@icare ~ > pstack core
core 'core' of 13842: mpirun
fffffd7ffee9dfe0 strlen () + 20
fffffd7ffeef6ab3 vsprintf () + 33
fffffd7fff180fd1 opal_vasprintf () + 41
fffffd7fff180f88 opal_asprintf () + 98
00000000004098a3 orterun () + 63
0000000000407214 main () + 34
000000000040708c ???????? ()
Seems very basic !
Using dbx produces a little more info, unfortunately cryptic for me:
valiron@icare ~ > dbx /users/valiron/lib/openmpi-1.0.2a9/bin/mpirun
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.5' in
your .dbxrc
Reading mpirun
Reading ld.so.1
Reading liborte.so.0.0.0
Reading libopal.so.0.0.0
Reading libdl.so.1
Reading libm.so.2
Reading libnsl.so.1
Reading libsocket.so.1
Reading libthread.so.1
Reading libc.so.1
(dbx) run
Running: mpirun
(process id 13881)
t@1 (l@1) signal SEGV (no mapping at the fault address) in strlen at
0xfffffd7ffee9dfe0
0xfffffd7ffee9dfe0: strlen+0x0020: cmpb $0x0000000000000000,(%rsi)
Current function is opal_vasprintf (optimized)
206 length = vsprintf(*ptr, fmt, ap);
(dbx)
For information I copied the man page for vsprintf()
Standard C Library Functions vprintf(3C)
NAME
vprintf, vfprintf, vsprintf, vsnprintf - print formatted
output of a variable argument list
SYNOPSIS
#include <stdio.h>
#include <stdarg.h>
int vprintf(const char *format, va_list ap);
int vfprintf(FILE *stream, const char *format, va_list ap);
int vsprintf(char *s, const char *format, va_list ap);
int vsnprintf(char *s, size_t n, const char *format, va_list
ap);
DESCRIPTION
The vprintf(), vfprintf(), vsprintf() and vsnprintf() func-
tions are the same as printf(), fprintf(), sprintf(), and
snprintf(), respectively, except that instead of being
called with a variable number of arguments, they are called
with an argument list as defined in the <stdarg.h> header.
See printf(3C).
The <stdarg.h> header defines the type va_list and a set of
macros for advancing through a list of arguments whose
number and types may vary. The argument ap to the vprint
family of functions is of type va_list. This argument is
used with the <stdarg.h> header file macros va_start(),
va_arg(), and va_end() (see stdarg(3EXT)). The EXAMPLES
section below demonstrates the use of va_start() and
va_end() with vprintf().
The macro va_alist() is used as the parameter list in a
function definition, as in the function called error() in
the example below. The macro va_start(ap, parmN), where ap
is of type va_list and parmN is the rightmost parameter
(just before ...), must be called before any attempt to
traverse and access unnamed arguments is made. The
va_end(ap) macro must be invoked when all desired arguments
have been accessed. The argument list in ap can be traversed
again if va_start() is called again after va_end(). In the
example below, the error() arguments (arg1, arg2, ...) are
passed to vfprintf() in the argument ap.
RETURN VALUES
Refer to printf(3C).
ERRORS
The vprintf() and vfprintf() functions will fail if either
the stream is unbuffered or the stream's buffer needed to be
flushed and:
EFBIG The file is a regular file and an attempt
was made to write at or beyond the offset
maximum.
Any idea ?
Of course I would be glad to provide an account to the machine (but for
security reasons not on the list...).
Pierre.
Brian Barrett wrote:
On Feb 27, 2006, at 8:50 AM, Pierre Valiron wrote:
- Make completed nicely, excepted compiling ompi/mpi/f90/mpi.f90
which took nearly half an hour to complete. I suspect the
optimization flags in FFLAGS are not important for applications,
and I could use -O0 or -O1 instead.
You probably won't see any performance impact at all if you compile
the Fortran 90 layer of Open MPI with no optimizations. It's a very
thin wrapper and the compiler isn't going to be able to do much with
it anyway. One other thing - if you know your F90 code never sends
arrays greater than dimension X (X defaults to 4), you can speed
things up immensly by configuring Open MPI with the option --with-f90-
max-array-dim=X.
- However the resulting executable fails to launch:
valiron@icare ~/config > mpirun --prefix /users/valiron/lib/
openmpi-1.0.2a9 -np 2 a.out
Segmentation fault (core dumped)
- The problem seems buried into open-mpi:
valiron@icare ~/config > pstack core
core 'core' of 27996: mpirun --prefix /users/valiron/lib/
openmpi-1.0.2a9 -np 2 a.out
fffffd7fff05dfe0 strlen () + 20
fffffd7fff0b6ab3 vsprintf () + 33
fffffd7fff2e4211 opal_vasprintf () + 41
fffffd7fff2e41c8 opal_asprintf () + 98
00000000004098a3 orterun () + 63
0000000000407214 main () + 34
000000000040708c ???????? ()
Ugh... Yes, we're probably doing something wrong there.
Unfortunately, neither Jeff nor I have access to an Opteron box
running Solaris and I can't replicate the problem on either a
UltraSparc running Solaris or an Opteron running Linux. Could you
compile Open MPI with CFLAGS set to "-g -O -xtarget=opteron -
xarch=amd64". Hopefully being able to see the callstack with some
line numbers will help a bit.
Brian
--
Soutenez le mouvement SAUVONS LA RECHERCHE :
http://recherche-en-danger.apinc.org/
_/_/_/_/ _/ _/ Dr. Pierre VALIRON
_/ _/ _/ _/ Laboratoire d'Astrophysique
_/ _/ _/ _/ Observatoire de Grenoble / UJF
_/_/_/_/ _/ _/ BP 53 F-38041 Grenoble Cedex 9 (France)
_/ _/ _/ http://www-laog.obs.ujf-grenoble.fr/~valiron/
_/ _/ _/ Mail: pierre.vali...@obs.ujf-grenoble.fr
_/ _/ _/ Phone: +33 4 7651 4787 Fax: +33 4 7644 8821
_/ _/_/