[OMPI users] Restarting from a checkpoint (OMPI 1.3)

2009-01-20 Thread Gregor Dschung
Hey,

I'm trying the new released Open MPI 1.3 in conjunction with BLCR to
provide the checkpoint/restart-feature.

Configured with ./configure --prefix=/usr/local --with-ft=cr
--enable-ft-thread --enable-mpi-threads --with-blcr=/

A MPI-job on a single machine (several threads) is checkpointed and
restarted very well.

The checkpoint of a MPI-job across two hosts (ethernet, tcp) is also
done without warnings or errors (the homedir and the directory, where
the MPI-Application is, are shared with NFS). The restart works too, but
all threads are only started on the host, where I enter the ompi-restart
command. Even if I add the -hostfile argument to ompi-restart, only the
one host is used.

Does anybody has a hint?

Thanks,
Gregor


[OMPI users] Open-MPI 1.3 and environment variables

2009-01-20 Thread jody
Under 1.2.8 i could check
  OMPI_MCA_ns_nds_vpid
to find out the process rank.
Under 1.3 that variable does not seem to exist anymore.
Is there an equivalent to hat variable in 1.3?
Have any other environment variables changed?

Thank You
  Jody


Re: [OMPI users] Problem compiling open mpi 1.3 with sunstudio12 express

2009-01-20 Thread Jeff Squyres

Can you send the information listed here:

http://www.open-mpi.org/community/help/


On Jan 19, 2009, at 11:49 AM, Olivier Marsden wrote:


Hello,

I'm trying to compile ompi 1.3rc7 with the sun studio express  
comilers.


I'm using the following configure command:

CC=/opt/sun/express/sunstudioceres/bin/cc CXX=/opt/sun/express/ 
sunstudioceres/bin/CC   F77=/opt/sun/express/sunstudioceres/bin/f77  
FC=/opt/sun/express/sunstudioceres/bin/f90  ./configure --prefix=/ 
opt/mpi_sun --enable-heterogeneous  --enable-shared --enable-mpi-f90  
--with-mpi-f90-size=small --disable-mpi-threads --disable-progress- 
threads --disable-debug  --without-udapl --disable-io-romio


The build and install execute correctly. However, I get the  
following when trying to use mpif90:

>> /opt/mpi_sun/bin/mpif90
gfortran: no input files

My /opt/mpi_sun/share/openmpi/mpif90-wrapper-data.txt file  appears  
to my layman eye to be correct, but just

in case, its contents is the following:

project=Open MPI
project_short=OMPI
version=1.3rc7
language=Fortran 90
compiler_env=FC
compiler_flags_env=FCFLAGS
compiler=/opt/sun/express/sunstudioceres/bin/f90
module_option=-M
extra_includes=
preprocessor_flags=
compiler_flags=
linker_flags=
libs=-lmpi_f90 -lmpi_f77 -lmpi -lopen-rte -lopen-pal   -ldl   -Wl,-- 
export-dynamic -lnsl -lutil -lm -ldl

required_file=
includedir=${includedir}
libdir=${libdir}


Can anyone see why gfortran is being used? (the config.log says that  
sun f90 is used )


Thanks,

Olivier


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



[OMPI users] error opal/libltdl

2009-01-20 Thread nilesh barange
Hi...

I am trying to install openmpi-1.2.8 on REHL-4.
I am getting following error.


[clususer@vlsiserver openmpi-1.2.8]$ make all install
Making all in config
make[1]: Entering directory `/home/clususer/openmpi-1.2.8/config'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/home/clususer/openmpi-1.2.8/config'
Making all in contrib
make[1]: Entering directory `/home/clususer/openmpi-1.2.8/contrib'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/home/clususer/openmpi-1.2.8/contrib'
Making all in opal
make[1]: Entering directory `/home/clususer/openmpi-1.2.8/opal'
Making all in include
make[2]: Entering directory `/home/clususer/openmpi-1.2.8/opal/include'
make  all-am
make[3]: Entering directory `/home/clususer/openmpi-1.2.8/opal/include'
make[3]: Leaving directory `/home/clususer/openmpi-1.2.8/opal/include'
make[2]: Leaving directory `/home/clususer/openmpi-1.2.8/opal/include'
Making all in libltdl
make[2]: Entering directory `/home/clususer/openmpi-1.2.8/opal/libltdl'
make  all-am
make[3]: Entering directory `/home/clususer/openmpi-1.2.8/opal/libltdl'
/bin/sh ./libtool --tag=CC   --mode=link gcc  -O3 -DNDEBUG  -module
-avoid-version  -o dlopen.la  dlopen.lo -ldl -ldl -lnsl  -lutil -lm
libtool: link: false cru .libs/dlopen.a .libs/dlopen.o
make[3]: *** [dlopen.la] Error 1
make[3]: Leaving directory `/home/clususer/openmpi-1.2.8/opal/libltdl'
make[2]: *** [all] Error 2
make[2]: Leaving directory `/home/clususer/openmpi-1.2.8/opal/libltdl'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/clususer/openmpi-1.2.8/opal'
make: *** [all-recursive] Error 1
[clususer@vlsiserver openmpi-1.2.8]$


Re: [OMPI users] Open-MPI 1.3 and environment variables

2009-01-20 Thread Ralph Castain
That was never an envar for public use, but rather one that was used  
internal to OMPI and therefore subject to change (which it did). There  
are a number of such variables in the system - we try to indicate this  
by not exposing them via ompi_info. Of course, you can independently  
discover them with a printenv, but we would not recommend relying on  
them.


The list of reliable envars is provided here:

http://www.open-mpi.org/faq/?category=running#mpi-environmental-variables

Note that this begins with 1.3 and does not apply to any prior releases.

Ralph


On Jan 20, 2009, at 4:43 AM, jody wrote:


Under 1.2.8 i could check
 OMPI_MCA_ns_nds_vpid
to find out the process rank.
Under 1.3 that variable does not seem to exist anymore.
Is there an equivalent to hat variable in 1.3?
Have any other environment variables changed?

Thank You
 Jody
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Open-MPI 1.3 and environment variables

2009-01-20 Thread jody
Thanks for the clarification.
Speaking of 'printenv' - i noticed that
even though $HOSTNAME is set on all of my machines:

 aim-nano_03 ~ # echo $HOSTNAME
 aim-nano_03

it does not appear in printenv's output:
  aim-nano_03 opt # printenv | grep HOST
  aim-nano_03 opt #
Is there something special about $HOSTNAME and how or when it is set?

Jody

On Tue, Jan 20, 2009 at 3:26 PM, Ralph Castain  wrote:
> That was never an envar for public use, but rather one that was used
> internal to OMPI and therefore subject to change (which it did). There are a
> number of such variables in the system - we try to indicate this by not
> exposing them via ompi_info. Of course, you can independently discover them
> with a printenv, but we would not recommend relying on them.
>
> The list of reliable envars is provided here:
>
> http://www.open-mpi.org/faq/?category=running#mpi-environmental-variables
>
> Note that this begins with 1.3 and does not apply to any prior releases.
>
> Ralph
>
>
> On Jan 20, 2009, at 4:43 AM, jody wrote:
>
>> Under 1.2.8 i could check
>>  OMPI_MCA_ns_nds_vpid
>> to find out the process rank.
>> Under 1.3 that variable does not seem to exist anymore.
>> Is there an equivalent to hat variable in 1.3?
>> Have any other environment variables changed?
>>
>> Thank You
>>  Jody
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Open-MPI 1.3 and environment variables

2009-01-20 Thread Ralph Castain

Not that I know of...I would expect it to work.


On Jan 20, 2009, at 8:47 AM, jody wrote:


Thanks for the clarification.
Speaking of 'printenv' - i noticed that
even though $HOSTNAME is set on all of my machines:

aim-nano_03 ~ # echo $HOSTNAME
aim-nano_03

it does not appear in printenv's output:
 aim-nano_03 opt # printenv | grep HOST
 aim-nano_03 opt #
Is there something special about $HOSTNAME and how or when it is set?

Jody

On Tue, Jan 20, 2009 at 3:26 PM, Ralph Castain  wrote:

That was never an envar for public use, but rather one that was used
internal to OMPI and therefore subject to change (which it did).  
There are a
number of such variables in the system - we try to indicate this by  
not
exposing them via ompi_info. Of course, you can independently  
discover them

with a printenv, but we would not recommend relying on them.

The list of reliable envars is provided here:

http://www.open-mpi.org/faq/?category=running#mpi-environmental-variables

Note that this begins with 1.3 and does not apply to any prior  
releases.


Ralph


On Jan 20, 2009, at 4:43 AM, jody wrote:


Under 1.2.8 i could check
OMPI_MCA_ns_nds_vpid
to find out the process rank.
Under 1.3 that variable does not seem to exist anymore.
Is there an equivalent to hat variable in 1.3?
Have any other environment variables changed?

Thank You
Jody
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Problem compiling open mpi 1.3 with sunstudio12 express

2009-01-20 Thread Olivier Marsden

Certainly.
I hope I haven't forgotten anything.

Olivier Marsden


Jeff Squyres wrote:

Can you send the information listed here:

http://www.open-mpi.org/community/help/


On Jan 19, 2009, at 11:49 AM, Olivier Marsden wrote:


Hello,

I'm trying to compile ompi 1.3rc7 with the sun studio express comilers.

I'm using the following configure command:

CC=/opt/sun/express/sunstudioceres/bin/cc 
CXX=/opt/sun/express/sunstudioceres/bin/CC   
F77=/opt/sun/express/sunstudioceres/bin/f77 
FC=/opt/sun/express/sunstudioceres/bin/f90  ./configure 
--prefix=/opt/mpi_sun --enable-heterogeneous  --enable-shared 
--enable-mpi-f90 --with-mpi-f90-size=small --disable-mpi-threads 
--disable-progress-threads --disable-debug  --without-udapl 
--disable-io-romio


The build and install execute correctly. However, I get the following 
when trying to use mpif90:

>> /opt/mpi_sun/bin/mpif90
gfortran: no input files

My /opt/mpi_sun/share/openmpi/mpif90-wrapper-data.txt file  appears 
to my layman eye to be correct, but just

in case, its contents is the following:

project=Open MPI
project_short=OMPI
version=1.3rc7
language=Fortran 90
compiler_env=FC
compiler_flags_env=FCFLAGS
compiler=/opt/sun/express/sunstudioceres/bin/f90
module_option=-M
extra_includes=
preprocessor_flags=
compiler_flags=
linker_flags=
libs=-lmpi_f90 -lmpi_f77 -lmpi -lopen-rte -lopen-pal   -ldl   
-Wl,--export-dynamic -lnsl -lutil -lm -ldl

required_file=
includedir=${includedir}
libdir=${libdir}


Can anyone see why gfortran is being used? (the config.log says that 
sun f90 is used )


Thanks,

Olivier


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users







ompi-output.tar.bz2
Description: application/bzip


Re: [OMPI users] Problem compiling open mpi 1.3 with sunstudio12 express

2009-01-20 Thread Jeff Squyres
Thanks, that was helpful.  From everything I can see, it looks like  
the ".../f90" value was propagated properly throughout OMPI's code base.


Does f90 invoke gfortran on the back-end?  Try invoking "/opt/sun/ 
express/sunstudioceres/bin/f90" with no arguments (just like you  
invoked mpif90 with no arguments) and see if you get the sam error.


You can also invoke "mpif90 --showme" to see the command that mpif90  
would have executed.



On Jan 20, 2009, at 11:15 AM, Olivier Marsden wrote:


Certainly.
I hope I haven't forgotten anything.

Olivier Marsden


Jeff Squyres wrote:

Can you send the information listed here:

   http://www.open-mpi.org/community/help/


On Jan 19, 2009, at 11:49 AM, Olivier Marsden wrote:


Hello,

I'm trying to compile ompi 1.3rc7 with the sun studio express  
comilers.


I'm using the following configure command:

CC=/opt/sun/express/sunstudioceres/bin/cc CXX=/opt/sun/express/ 
sunstudioceres/bin/CC   F77=/opt/sun/express/sunstudioceres/bin/ 
f77 FC=/opt/sun/express/sunstudioceres/bin/f90  ./configure -- 
prefix=/opt/mpi_sun --enable-heterogeneous  --enable-shared -- 
enable-mpi-f90 --with-mpi-f90-size=small --disable-mpi-threads -- 
disable-progress-threads --disable-debug  --without-udapl -- 
disable-io-romio


The build and install execute correctly. However, I get the  
following when trying to use mpif90:

>> /opt/mpi_sun/bin/mpif90
gfortran: no input files

My /opt/mpi_sun/share/openmpi/mpif90-wrapper-data.txt file   
appears to my layman eye to be correct, but just

in case, its contents is the following:

project=Open MPI
project_short=OMPI
version=1.3rc7
language=Fortran 90
compiler_env=FC
compiler_flags_env=FCFLAGS
compiler=/opt/sun/express/sunstudioceres/bin/f90
module_option=-M
extra_includes=
preprocessor_flags=
compiler_flags=
linker_flags=
libs=-lmpi_f90 -lmpi_f77 -lmpi -lopen-rte -lopen-pal   -ldl   - 
Wl,--export-dynamic -lnsl -lutil -lm -ldl

required_file=
includedir=${includedir}
libdir=${libdir}


Can anyone see why gfortran is being used? (the config.log says  
that sun f90 is used )


Thanks,

Olivier


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Problem compiling open mpi 1.3 with sunstudio12 express

2009-01-20 Thread Olivier Marsden
f90 works correctly, when run simply as f90 or as /opt/sun/etc.../f90, 
and binaries run properly (sun f90 appears

to give excellent performance, incidently!)

the command  /opt/mpi_sun/bin/mpif90 --show-me
returns:

/home/marsden/sources/gcc_final/bin/gfortran 
-I/opt/mpi_gfortran4.4//include -pthread -I/opt/mpi_gfortran4.4//lib 
-L/opt/mpi_gfortran4.4//lib -lmpi_f90 -lmpi_f77 -lmpi -lopen-rte 
-lopen-pal -ldl -Wl,--export-dynamic -lnsl -lutil -lm -ldl


For what it's worth, and as you've probably guessed, I do have another 
version installation of openmpi.
In fact two, one with the system gcc/gfortran4.2, and one with a locally 
compiled gcc/gfortran4.4 .
These both work correctly.   The second installation seems to be 
interfering with my current attempt,
even though I exported all environment variables I can think of to point 
to sun compilers & libraries first,

before configure & compile.


Jeff Squyres wrote:
Thanks, that was helpful.  From everything I can see, it looks like 
the ".../f90" value was propagated properly throughout OMPI's code base.


Does f90 invoke gfortran on the back-end?  Try invoking 
"/opt/sun/express/sunstudioceres/bin/f90" with no arguments (just like 
you invoked mpif90 with no arguments) and see if you get the sam error.


You can also invoke "mpif90 --showme" to see the command that mpif90 
would have executed.



On Jan 20, 2009, at 11:15 AM, Olivier Marsden wrote:


Certainly.
I hope I haven't forgotten anything.

Olivier Marsden


Jeff Squyres wrote:

Can you send the information listed here:

   http://www.open-mpi.org/community/help/


On Jan 19, 2009, at 11:49 AM, Olivier Marsden wrote:


Hello,

I'm trying to compile ompi 1.3rc7 with the sun studio express 
comilers.


I'm using the following configure command:

CC=/opt/sun/express/sunstudioceres/bin/cc 
CXX=/opt/sun/express/sunstudioceres/bin/CC   
F77=/opt/sun/express/sunstudioceres/bin/f77 
FC=/opt/sun/express/sunstudioceres/bin/f90  ./configure 
--prefix=/opt/mpi_sun --enable-heterogeneous  --enable-shared 
--enable-mpi-f90 --with-mpi-f90-size=small --disable-mpi-threads 
--disable-progress-threads --disable-debug  --without-udapl 
--disable-io-romio


The build and install execute correctly. However, I get the 
following when trying to use mpif90:

>> /opt/mpi_sun/bin/mpif90
gfortran: no input files

My /opt/mpi_sun/share/openmpi/mpif90-wrapper-data.txt file  appears 
to my layman eye to be correct, but just

in case, its contents is the following:

project=Open MPI
project_short=OMPI
version=1.3rc7
language=Fortran 90
compiler_env=FC
compiler_flags_env=FCFLAGS
compiler=/opt/sun/express/sunstudioceres/bin/f90
module_option=-M
extra_includes=
preprocessor_flags=
compiler_flags=
linker_flags=
libs=-lmpi_f90 -lmpi_f77 -lmpi -lopen-rte -lopen-pal   -ldl   
-Wl,--export-dynamic -lnsl -lutil -lm -ldl

required_file=
includedir=${includedir}
libdir=${libdir}


Can anyone see why gfortran is being used? (the config.log says 
that sun f90 is used )


Thanks,

Olivier


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users







Re: [OMPI users] Problem compiling open mpi 1.3 with sunstudio12 express

2009-01-20 Thread Jeff Squyres

On Jan 20, 2009, at 5:04 PM, Olivier Marsden wrote:

f90 works correctly, when run simply as f90 or as /opt/sun/etc.../ 
f90, and binaries run properly (sun f90 appears

to give excellent performance, incidently!)

the command  /opt/mpi_sun/bin/mpif90 --show-me
returns:

/home/marsden/sources/gcc_final/bin/gfortran -I/opt/mpi_gfortran4.4// 
include -pthread -I/opt/mpi_gfortran4.4//lib -L/opt/mpi_gfortran4.4// 
lib -lmpi_f90 -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl -Wl,-- 
export-dynamic -lnsl -lutil -lm -ldl


Interesting.

For what it's worth, and as you've probably guessed, I do have  
another version installation of openmpi.
In fact two, one with the system gcc/gfortran4.2, and one with a  
locally compiled gcc/gfortran4.4 .
These both work correctly.   The second installation seems to be  
interfering with my current attempt,
even though I exported all environment variables I can think of to  
point to sun compilers & libraries first,

before configure & compile.


I have oodles of installations of OMPI on my machines; they don't  
interfere with each other.


So let's see if we can figure out why yours don't seem to play well  
together:


- Check that /opt/mpi_sun and /opt/mpi_gfortran* are actually distinct  
subdirectories; there's no hidden sym/hard links in there somewhere  
(where directories and/or individual files might accidentally be  
pointing to the other tree)


- does "env | grep mpi_" show anything interesting / revealing?  What  
is your LD_LIBRARY_PATH set to?


- what does ldd on the various .so files in /opt/mpi_sun/lib/ show?   
Are they linked against files in their own tree, or the other tree?


- run "mpif90 --showme" through strace; does it show anything  
illuminating?


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] error opal/libltdl

2009-01-20 Thread Jeff Squyres

Can you send all the information listed here:

http://www.open-mpi.org/community/help/


On Jan 20, 2009, at 8:08 AM, nilesh barange wrote:


Hi...

I am trying to install openmpi-1.2.8 on REHL-4.
I am getting following error.


[clususer@vlsiserver openmpi-1.2.8]$ make all install
Making all in config
make[1]: Entering directory `/home/clususer/openmpi-1.2.8/config'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/home/clususer/openmpi-1.2.8/config'
Making all in contrib
make[1]: Entering directory `/home/clususer/openmpi-1.2.8/contrib'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/home/clususer/openmpi-1.2.8/contrib'
Making all in opal
make[1]: Entering directory `/home/clususer/openmpi-1.2.8/opal'
Making all in include
make[2]: Entering directory `/home/clususer/openmpi-1.2.8/opal/ 
include'

make  all-am
make[3]: Entering directory `/home/clususer/openmpi-1.2.8/opal/ 
include'

make[3]: Leaving directory `/home/clususer/openmpi-1.2.8/opal/include'
make[2]: Leaving directory `/home/clususer/openmpi-1.2.8/opal/include'
Making all in libltdl
make[2]: Entering directory `/home/clususer/openmpi-1.2.8/opal/ 
libltdl'

make  all-am
make[3]: Entering directory `/home/clususer/openmpi-1.2.8/opal/ 
libltdl'
/bin/sh ./libtool --tag=CC   --mode=link gcc  -O3 -DNDEBUG  -module - 
avoid-version  -o dlopen.la  dlopen.lo -ldl -ldl -lnsl  -lutil -lm

libtool: link: false cru .libs/dlopen.a .libs/dlopen.o
make[3]: *** [dlopen.la] Error 1
make[3]: Leaving directory `/home/clususer/openmpi-1.2.8/opal/libltdl'
make[2]: *** [all] Error 2
make[2]: Leaving directory `/home/clususer/openmpi-1.2.8/opal/libltdl'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/clususer/openmpi-1.2.8/opal'
make: *** [all-recursive] Error 1
[clususer@vlsiserver openmpi-1.2.8]$


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems