Re: [OMPI users] Relocating an Open MPI installation using OPAL_PREFIX

2009-01-06 Thread Brian Barrett

Sorry I haven't jumped in this thread earlier -- I've been a bit behind.

The multi-lib support worked at one time, and I can't think of why it  
would have changed.  The one condition is that libdir, includedir,  
etc. *MUST* be specified relative to $prefix for it to work.  It looks  
like you were defining them as absolute paths, so you'd have to set  
libdir directly, which will never work in multi-lib because mpirun and  
the app likely have different word sizes and therefore different  
libdirs.  More information is on the multilib page in the wiki:


  https://svn.open-mpi.org/trac/ompi/wiki/MultiLib

There is actually one condition we do not handle properly, the prefix  
flag to mpirun.  The LD_LIBRARY_PATH will only be set for the word  
size of mpirun, and not the executable.  Really, both would have to be  
added (so that both orted, which is likely always 32 bit in a multilib  
situation and the app both find their libraries).



Brian

On Jan 5, 2009, at 6:02 PM, Jeff Squyres wrote:

I honestly haven't thought through the ramifications of doing a  
multi-lib build with OPAL_PREFIX et al. :-\


If you setenv OPAL_LIBDIR, it'll use whatever you set it to, so it  
doesn't matter what you configured --libdir with.  Additionally mca/ 
installdirs/config/install_dirs.h has this by default:


#define OPAL_LIBDIR "${exec_prefix}/lib"

Hence, if you use a default --libdir and setenv OPAL_PREFIX, then  
the libdir should pick up the right thing (because it's based on the  
prefix).  But if you use --libdir that is *not* based on $ 
{exec_prefix}, then you might run into problems.


Perhaps you can '--libdir="${exec_prefix}/lib64"' so that you can  
have your custom libdir, but still have it dependent upon the prefix  
that gets expanded at run time...?


(again, I'm not thinking all of this through -- just offering a few  
suggestions off the top of my head that you'll need to test / trace  
the code to be sure...)



On Jan 5, 2009, at 1:35 PM, Ethan Mallove wrote:


On Thu, Dec/25/2008 08:12:49AM, Jeff Squyres wrote:
It's quite possible that we don't handle this situation properly.   
Won't
you need to libdir's (one for the 32 bit OMPI executables, and one  
for the

64 bit MPI apps)?


I don't need an OPAL environment variable for the executables, just a
single OPAL_LIBDIR var for the libraries. (One set of 32-bit
executables runs with both 32-bit and 64-bit libraries.) I'm guessing
OPAL_LIBDIR will not work for you if you configure with a non- 
standard

--libdir option.

-Ethan




On Dec 23, 2008, at 3:58 PM, Ethan Mallove wrote:


I think the problem is that I am doing a multi-lib build. I have
32-bit libraries in lib/, and 64-bit libraries in lib/64. I  
assume I

do not see the issue for 32-bit tests, because all the dependencies
are where Open MPI expects them to be. For the 64-bit case, I tried
setting OPAL_LIBDIR to /opt/openmpi-relocated/lib/lib64, but no  
luck.
Given the below configure arguments, what do my OPAL_* env vars  
need
to be? (Also, could using --enable-orterun-prefix-by-default  
interfere

with OPAL_PREFIX?)

 $ ./configure CC=cc CXX=CC F77=f77 FC=f90  --with-openib
--without-udapl --disable-openib-ibcm --enable-heterogeneous
--enable-cxx-exceptions --enable-shared --enable-orterun-prefix- 
by-default

--with-sge --enable-mpi-f90 --with-mpi-f90-size=small
--disable-mpi-threads --disable-progress-threads   --disable-debug
CFLAGS="-m32 -xO5" CXXFLAGS="-m32 -xO5" FFLAGS="-m32 -xO5"   
FCFLAGS="-m32

-xO5"
--prefix=/workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ompi- 
tarball-testing/installs/DGQx/install
--mandir=/workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ompi- 
tarball-testing/installs/DGQx/install/man
--libdir=/workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ompi- 
tarball-testing/installs/DGQx/install/lib
--includedir=/workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ 
ompi-tarball-testing/installs/DGQx/install/include
--without-mx --with-tm=/ws/ompi-tools/orte/torque/current/shared- 
install32
--with-contrib-vt-flags="--prefix=/workspace/em162155/hpc/mtt- 
scratch/burl-ct-v!

20z-12/ompi-tarball-testing/installs/DGQx/install
--mandir=/workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ompi- 
tarball-testing/installs/DGQx/install/man
--libdir=/workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ompi- 
tarball-testing/installs/DGQx/install/lib
--includedir=/workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ 
ompi-tarball-testing/installs/DGQx/install/include
LDFLAGS=-R/workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ 
ompi-tarball-testing/installs/DGQx/install/lib"


 $ ./confgiure CC=cc CXX=CC F77=f77 FC=f90  --with-openib
--without-udapl --disable-openib-ibcm --enable-heterogeneous
--enable-cxx-exceptions --enable-shared --enable-orterun-prefix- 
by-default

--with-sge --enable-mpi-f90 --with-mpi-f90-size=small
--disable-mpi-threads --disable-progress-threads   --disable-debug
CFLAGS="-m64 -xO5" CXXFLAGS="-m64 -xO5" FFLAGS="-m64 -xO5"   
FCFLAGS="-m64

Re: [OMPI users] using the carto facility

2009-01-06 Thread Terry Dontje
Lydia, sorry I led you astray I meant for you to use the rankfile 
feature as described in the mpirun manpage under the heading "Specifying 
Ranks".


--td


Message: 1
Date: Mon, 5 Jan 2009 17:09:41 + (GMT)
From: Lydia Heck 
Subject: [OMPI users] using the carto facility
To: us...@open-mpi.org
Message-ID: 
Content-Type: TEXT/PLAIN; charset=US-ASCII



I was advised for a benchmark to use the OPAL carto option to
assign specific cores to a job. I searched the web for an example
but have only found one set of man pages, which is rather cryptic
and needs the knowledge of the programmer rather than an end user.

Has anybody out there used this option and if so would you be prepared
to share an example which could be adapted for a shared memory system
with silions of cores.

Thanks.

Lydia


--
Dr E L  Heck

University of Durham
Institute for Computational Cosmology
Ogden Centre
Department of Physics
South Road

DURHAM, DH1 3LE
United Kingdom

e-mail: lydia.h...@durham.ac.uk

Tel.: + 44 191 - 334 3628
Fax.: + 44 191 - 334 3645
___


  




[OMPI users] default hostfile with 1.3 version

2009-01-06 Thread Bernard Secher - SFME/LGLS

Hello,

I take 1.3 version from svn base.

The default hostfile in etc/openmpi-default-hostfile is not taken. I 
must give to mpirun the -hostfile option to take this file. Is there any 
change in 1.3 version?


Regards
Bernard





Re: [OMPI users] question running on heterogeneous systems

2009-01-06 Thread Mahmoud Payami
Dear Gus,

Thank you for the detailed explanation. It is quite helpful. I think now I
have got how to manage the problem.

Best regards,

Mahmoud Payami
Theoretical Physics Group,
Atomic Energy Organization of Iran
Tehran-Iran
mpay...@aeoi.org.ir


On Mon, Jan 5, 2009 at 12:21 PM, Gus Correa  wrote:

> Mahmoud Payami wrote:
>
>>
>>
>> On Fri, Jan 2, 2009 at 9:08 AM, doriankrause > doriankra...@web.de>> wrote:
>>
>>Mahmoud Payami wrote:
>>
>>
>>Dear OpenMPI Users,
>>
>>I have two systems, one with Intel64 processor, and one with
>>IA32. The OSs on first is CentOS-86_64 and the other
>>CentOS-i386. I installed Intel fortran compiler 10.1 on both.
>> In the first I use the fce, and in the second I use fc
>>directories (ifortvars.sh/csh). I have compiled openmpi
>>separately on each machine. Now, I could not run my
>>application whch is compiled on ia32 machine. Should I use
>>"fc" instead of "fce" on intel64 and then compile openmpi with
>>that?
>>
>>
>>Could you give us some more information? What is the error message?
>>You said that the application is compiled for the 32 bit
>>architecture. I'm not used to mixing 32/64 bit architectures. Does
>>the application run on each host seperately?
>>
>>Dorian
>>
>>
>>
>>  Hi Mahmoud, list
>
>> Dear Dorian,
>> Thank you  for your contribution. The application, compiled on each box
>> separately, is ok with mpi an no problem. Recently, I had checked that a
>> binary file created on ia32, also works on 86_64 but the reverse is not
>> true.
>>
> That is correct.
> x86-64 architecture can run 32-bit binaries,
> but 64-bit binaries don't work on x86 machines.
>
>> So, why not a parallel program which is compiled on ia32 box? I think, if
>> I configure and install openmpi using ia32 intel compiler on 86_64 box, then
>> it will be resolved.
>>
> 1. You need to compile OpenMPI separately on each architecture.
> Use the "--prefix=/path/to/my/openmpi/32bit/" (32-bit example/suggestion)
> configure option, to install the two libraries on different locations,
> if you want. This will make clear for which architecture the library was
> built for.
>
> 2. You need to compile your application separately on each architecture,
> and link to the OpenMPI libraries built for that specific architecture
> according to item 1  above.
> (I.e. don't mix apples and oranges.)
>
> 3. You need to have the correct environment variables set
> on each machine architecture.
> They are *different* on each architecture.
>
> I.e., if you use Intel Fortran,
> source the fc script on the 32bit machine,
> and source the fce script on the 64-bit machine.
>
> This can be done on the .bashrc or .tcshrc file.
> If you have a different home directory on each machine,
> you can write a .bashrc or .tcshrc file for each architecture.
> If you have a single NFS mounted home directory,
> use a trick like this (tcsh example):
>
> if ( $HOST == "my_32bit_hostname" ) then
>   source /path/to/intel/fc/bin/ifortvars.csh # Note "fc" here.
> else if ( $HOST == "my_64bit_hostname"  ) then
>   source /path/to/intel/fce/bin/ifortvars.csh   # Note "fce" here.
> endif
>
> whatever your "my_32bit_hostname", "my_64bit_hostname".
> /path/to/intel/fc/, and  /path/to/intel/fce/   are.
> (Do "hostname" on each machine to find out the right name to use.)
>
> Likewise for the OpenMPI binaries (mpicc, mpif90, mpirun, etc):
>
> if ( $HOST == "my_32bit_hostname" ) then
>   setenv PATH /path/to/my/openmpi/32bit/bin:$PATH   # Note "32bit" here.
> else if ( $HOST == "my_64bit_hostname"  ) then
>   setenv PATH /path/to/my/openmpi/64bit/bin:$PATH# Note "64bit" here.
> endif
>
> This approach also works for separate home directories "per machine"
> (not NFS mounted), and is probably the simplest way to solve the problem.
>
> There are more elegant ways to setup the environment of choice,
> other than changing the user startup files.
> For instance, you can write intel.csh and intel.sh on the /etc/profile.d
> directory,
> to setup the appropriate environment as the user logs in.
> See also the "environment modules" package:
> http://modules.sourceforge.net/
>
> 4) If you run MPI programs across the two machines/architectures,
> make sure to use the MPI types on MPI function calls correctly,
> and to match them properly to the native Fortran (or C) types
> on each machine/architecture.
>
> I hope this helps.
> Gus Correa
> -
> Gustavo Correa, PhD - Email: g...@ldeo.columbia.edu
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> -
>
>> I have to check it and will report the result. In present case, it is
>> searching for shared lib.so.0 which has some extension "..ELF...64". I have
>> already added "/usr/local/lib" which contains mpi libs in LD_L

Re: [OMPI users] using ompi-server on a single node

2009-01-06 Thread Ralph Castain
The code that discovers local interfaces specifically ignores any  
interfaces that are not up or are just local loopbacks. My guess is  
that the person who wrote that code long, long ago was assuming that  
the sole purpose was to talk to remote nodes, not to loop back onto  
yourself.


I imagine it could be changed to include loopback, but I would first  
need to work with other developers to ensure there are no unexpected  
consequences in doing so.


Ralph

On Jan 5, 2009, at 3:49 PM, Terry Frankcombe wrote:


But why doesn't tcp work on loopback?


On Mon, 2009-01-05 at 07:25 -0700, Ralph Castain wrote:

It is currently a known limitation - shared memory currently only
works between procs from the same job. There is an enhancement coming
that will remove this restriction, but it won't be out for some time.

Ralph

On Jan 5, 2009, at 1:06 AM, Thomas Ropars wrote:


Hi,

I've tried to use ompi-server to connect 2 processes belonging to
different jobs but running on the same computer. It works when the
computer has a network interface up. But if the only active network
interface is the local loop, it doesn't work.

According to what I understood reading the code, it is because no  
btl
component can be used in this case. "tcp" is not used because  
usually
it is the "sm" component that is used for processes on the same  
host.
But in that case it doesn't work because "sm" is supposed to work  
only

for processes of the same job.

I know that this use-case is not very frequent  :)
But Is there a solution to make it work ? or is it a known
limitation ?

Regards

Thomas

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] mpirun hangs

2009-01-06 Thread Ralph Castain


On Jan 5, 2009, at 5:19 PM, Jeff Squyres wrote:


On Jan 5, 2009, at 5:01 PM, Maciej Kazulak wrote:

Interesting though. I thought in such a simple scenario shared  
memory would be used for IPC (or whatever's fastest) . But nope.  
Even with one process still it wants to use TCP/IP to communicate  
between mpirun and orted.


Correct -- we only have TCP enabled for MPI process <--> orted  
communication.  There are several reasons why; the simplest is that  
this is our "out of band" channel and it is only used to setup and  
tear down the job.  As such, we don't care that it's a little slower  
than other possible channels (such as sm).  MPI traffic will use  
shmem, OpenFabrics-based networks, Myrinet, ...etc.  But not MPI  
process <--> orted communication.


What's even more surprising to me it won't use loopback for that.  
Hence my maybe a little bit over-restrictive iptables rules were  
the problem. I allowed traffic from 127.0.0.1 to 127.0.0.1 on lo  
but not from  to  on eth0 and both processes  
ended up waiting for IO.


Can I somehow configure it to use something other than TCP/IP here?  
Or at least switch it to loopback?


I don't remember how it works in the v1.2 series offhand; I think  
it's different in the v1.3 series (where all MPI processes *only*  
talk to the local orted, vs. MPI processes making direct TCP  
connections back to mpirun and any other MPI process with which it  
needs to bootstrap other communication channels).  I'm *guessing*  
that the MPI process <--> orted communication either uses a named  
unix socket or TCP loopback.  Ralph -- can you explain the details?


In the 1.2 series, mpirun spawns a local orted to handle all local  
procs. The code that discovers local interfaces specifically ignores  
any interfaces that are not up or are just local loopbacks. My guess  
is that the person who wrote that code long, long ago was assuming  
that the sole purpose was to talk to remote nodes, not to loop back  
onto yourself.


I imagine it could be changed to include loopback, but I would first  
need to work with other developers to ensure there are no unexpected  
consequences in doing so. Since no TCP interface is found, mpirun fails.


In the 1.3 series, mpirun handles the local procs itself. Thus, this  
issue does not appear and things run just fine.



Ralph



--
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] default hostfile with 1.3 version

2009-01-06 Thread Ralph Castain
I'm afraid that the changes in how we handle hostfiles forced us to  
remove the default hostfile name. Beginning with 1.3, you will need to  
specify it.


Note that you can do this in your etc/openmpi-mca-params.conf file, if  
you want.


Ralph

On Jan 6, 2009, at 4:36 AM, Bernard Secher - SFME/LGLS wrote:


Hello,

I take 1.3 version from svn base.

The default hostfile in etc/openmpi-default-hostfile is not taken. I  
must give to mpirun the -hostfile option to take this file. Is there  
any change in 1.3 version?


Regards
Bernard

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] default hostfile with 1.3 version

2009-01-06 Thread Bernard Secher - SFME/LGLS

How can i do this in my etc/openmpi-mca-params.conf file ?

Bernard

Ralph Castain a écrit :
I'm afraid that the changes in how we handle hostfiles forced us to 
remove the default hostfile name. Beginning with 1.3, you will need to 
specify it.


Note that you can do this in your etc/openmpi-mca-params.conf file, if 
you want.


Ralph

On Jan 6, 2009, at 4:36 AM, Bernard Secher - SFME/LGLS wrote:


Hello,

I take 1.3 version from svn base.

The default hostfile in etc/openmpi-default-hostfile is not taken. I 
must give to mpirun the -hostfile option to take this file. Is there 
any change in 1.3 version?


Regards
Bernard

___

users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--

  _\\|//_
 (' 0 0 ')
ooO  (_) Ooo__
Bernard Sécher  DEN/DM2S/SFME/LGLSmailto : bsec...@cea.fr
CEA Saclay, Bât 454, Pièce 114Phone  : 33 (0)1 69 08 73 78
91191 Gif-sur-Yvette Cedex, FranceFax: 33 (0)1 69 08 10 87
Oooo---
  oooO (   )
  (   ) ) /
   \ ( (_/
\_)


Ce message électronique et tous les fichiers attachés qu'il contient
sont confidentiels et destinés exclusivement à l'usage de la personne
à laquelle ils sont adressés. Si vous avez reçu ce message par erreur,
merci d'en avertir immédiatement son émetteur et de ne pas en conserver
de copie.

This e-mail and any files transmitted with it are confidential and
intended solely for the use of the individual to whom they are addressed.
If you have received this e-mail in error please inform the sender
immediately, without keeping any copy thereof.



Re: [OMPI users] default hostfile with 1.3 version

2009-01-06 Thread Ralph Castain

Just add a line:

orte_default_hostfile = your_hostfile

You might also want to look at the wiki page describing the changed  
behavior for hostfiles:


https://svn.open-mpi.org/trac/ompi/wiki/HostFilePlan

In addition, you might want to look at the mpirun man page as there is  
now a default-hostfile and a hostfile option to mpirun.


Ralph


On Jan 6, 2009, at 7:39 AM, Bernard Secher - SFME/LGLS wrote:


How can i do this in my etc/openmpi-mca-params.conf file ?

Bernard

Ralph Castain a écrit :


I'm afraid that the changes in how we handle hostfiles forced us to  
remove the default hostfile name. Beginning with 1.3, you will need  
to specify it.


Note that you can do this in your etc/openmpi-mca-params.conf file,  
if you want.


Ralph

On Jan 6, 2009, at 4:36 AM, Bernard Secher - SFME/LGLS wrote:


Hello,

I take 1.3 version from svn base.

The default hostfile in etc/openmpi-default-hostfile is not taken.  
I must give to mpirun the -hostfile option to take this file. Is  
there any change in 1.3 version?


Regards
Bernard

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
   _\\|//_
  (' 0 0 ')
ooO  (_) Ooo__
 Bernard Sécher  DEN/DM2S/SFME/LGLSmailto : bsec...@cea.fr
 CEA Saclay, Bât 454, Pièce 114Phone  : 33 (0)1 69 08 73 78
 91191 Gif-sur-Yvette Cedex, FranceFax: 33 (0)1 69 08 10 87
Oooo---
   oooO (   )
   (   ) ) /
\ ( (_/
 \_)


Ce message électronique et tous les fichiers attachés qu'il contient
sont confidentiels et destinés exclusivement à l'usage de la personne
à laquelle ils sont adressés. Si vous avez reçu ce message par erreur,
merci d'en avertir immédiatement son émetteur et de ne pas en  
conserver

de copie.

This e-mail and any files transmitted with it are confidential and
intended solely for the use of the individual to whom they are  
addressed.

If you have received this e-mail in error please inform the sender
immediately, without keeping any copy thereof.

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Relocating an Open MPI installation using OPAL_PREFIX

2009-01-06 Thread Ethan Mallove
On Mon, Jan/05/2009 10:14:30PM, Brian Barrett wrote:
> Sorry I haven't jumped in this thread earlier -- I've been a bit behind.
>
> The multi-lib support worked at one time, and I can't think of why it would 
> have changed.  The one condition is that libdir, includedir, etc. *MUST* be 
> specified relative to $prefix for it to work.  It looks like you were 
> defining them as absolute paths, so you'd have to set libdir directly, 
> which will never work in multi-lib because mpirun and the app likely have 
> different word sizes and therefore different libdirs.  
>

I see. I'll try configuring with relative paths using ${prefix} and
the like.

> More information is on the multilib page in the wiki:
>
>   https://svn.open-mpi.org/trac/ompi/wiki/MultiLib
>

I removed this line from the MultiLib wiki page since Open MPI *is*
now relocatable using the OPAL_PREFIX env vars:

  "Presently, Open MPI is not relocatable.  That is, Open MPI *must*
  be installed and executed from which ever prefix was specified
  during configure.  This is planned to change in the very near
  future."

Thanks,
Ethan


> There is actually one condition we do not handle properly, the prefix flag 
> to mpirun.  The LD_LIBRARY_PATH will only be set for the word size of 
> mpirun, and not the executable.  Really, both would have to be added (so 
> that both orted, which is likely always 32 bit in a multilib situation and 
> the app both find their libraries).
>
> Brian
>
> On Jan 5, 2009, at 6:02 PM, Jeff Squyres wrote:
>
>> I honestly haven't thought through the ramifications of doing a multi-lib 
>> build with OPAL_PREFIX et al. :-\
>>
>> If you setenv OPAL_LIBDIR, it'll use whatever you set it to, so it doesn't 
>> matter what you configured --libdir with.  Additionally 
>> mca/installdirs/config/install_dirs.h has this by default:
>>
>> #define OPAL_LIBDIR "${exec_prefix}/lib"
>>
>> Hence, if you use a default --libdir and setenv OPAL_PREFIX, then the 
>> libdir should pick up the right thing (because it's based on the prefix).  
>> But if you use --libdir that is *not* based on ${exec_prefix}, then you 
>> might run into problems.
>>
>> Perhaps you can '--libdir="${exec_prefix}/lib64"' so that you can have 
>> your custom libdir, but still have it dependent upon the prefix that gets 
>> expanded at run time...?
>>
>> (again, I'm not thinking all of this through -- just offering a few 
>> suggestions off the top of my head that you'll need to test / trace the 
>> code to be sure...)
>>
>>
>> On Jan 5, 2009, at 1:35 PM, Ethan Mallove wrote:
>>
>>> On Thu, Dec/25/2008 08:12:49AM, Jeff Squyres wrote:
 It's quite possible that we don't handle this situation properly.  Won't
 you need to libdir's (one for the 32 bit OMPI executables, and one for 
 the
 64 bit MPI apps)?
>>>
>>> I don't need an OPAL environment variable for the executables, just a
>>> single OPAL_LIBDIR var for the libraries. (One set of 32-bit
>>> executables runs with both 32-bit and 64-bit libraries.) I'm guessing
>>> OPAL_LIBDIR will not work for you if you configure with a non-standard
>>> --libdir option.
>>>
>>> -Ethan
>>>
>>>

 On Dec 23, 2008, at 3:58 PM, Ethan Mallove wrote:

> I think the problem is that I am doing a multi-lib build. I have
> 32-bit libraries in lib/, and 64-bit libraries in lib/64. I assume I
> do not see the issue for 32-bit tests, because all the dependencies
> are where Open MPI expects them to be. For the 64-bit case, I tried
> setting OPAL_LIBDIR to /opt/openmpi-relocated/lib/lib64, but no luck.
> Given the below configure arguments, what do my OPAL_* env vars need
> to be? (Also, could using --enable-orterun-prefix-by-default interfere
> with OPAL_PREFIX?)
>
>  $ ./configure CC=cc CXX=CC F77=f77 FC=f90  --with-openib
> --without-udapl --disable-openib-ibcm --enable-heterogeneous
> --enable-cxx-exceptions --enable-shared 
> --enable-orterun-prefix-by-default
> --with-sge --enable-mpi-f90 --with-mpi-f90-size=small
> --disable-mpi-threads --disable-progress-threads   --disable-debug
> CFLAGS="-m32 -xO5" CXXFLAGS="-m32 -xO5" FFLAGS="-m32 -xO5"  
> FCFLAGS="-m32
> -xO5"
> --prefix=/workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ompi-tarball-testing/installs/DGQx/install
> --mandir=/workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ompi-tarball-testing/installs/DGQx/install/man
> --libdir=/workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ompi-tarball-testing/installs/DGQx/install/lib
> --includedir=/workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ompi-tarball-testing/installs/DGQx/install/include
> --without-mx 
> --with-tm=/ws/ompi-tools/orte/torque/current/shared-install32
> --with-contrib-vt-flags="--prefix=/workspace/em162155/hpc/mtt-scratch/burl-ct-v!
> 20z-12/ompi-tarball-testing/installs/DGQx/install
> --mandir=/workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ompi-tarball-testing/installs/DG

Re: [OMPI users] mpirun hangs

2009-01-06 Thread Maciej Kazulak
2009/1/6 Ralph Castain 

>
> On Jan 5, 2009, at 5:19 PM, Jeff Squyres wrote:
>
>  On Jan 5, 2009, at 5:01 PM, Maciej Kazulak wrote:
>>
>>  Interesting though. I thought in such a simple scenario shared memory
>>> would be used for IPC (or whatever's fastest) . But nope. Even with one
>>> process still it wants to use TCP/IP to communicate between mpirun and
>>> orted.
>>>
>>
>> Correct -- we only have TCP enabled for MPI process <--> orted
>> communication.  There are several reasons why; the simplest is that this is
>> our "out of band" channel and it is only used to setup and tear down the
>> job.  As such, we don't care that it's a little slower than other possible
>> channels (such as sm).  MPI traffic will use shmem, OpenFabrics-based
>> networks, Myrinet, ...etc.  But not MPI process <--> orted communication.
>>
>>  What's even more surprising to me it won't use loopback for that. Hence
>>> my maybe a little bit over-restrictive iptables rules were the problem. I
>>> allowed traffic from 127.0.0.1 to 127.0.0.1 on lo but not from 
>>> to  on eth0 and both processes ended up waiting for IO.
>>>
>>> Can I somehow configure it to use something other than TCP/IP here? Or at
>>> least switch it to loopback?
>>>
>>
>> I don't remember how it works in the v1.2 series offhand; I think it's
>> different in the v1.3 series (where all MPI processes *only* talk to the
>> local orted, vs. MPI processes making direct TCP connections back to mpirun
>> and any other MPI process with which it needs to bootstrap other
>> communication channels).  I'm *guessing* that the MPI process <--> orted
>> communication either uses a named unix socket or TCP loopback.  Ralph -- can
>> you explain the details?
>>
>
> In the 1.2 series, mpirun spawns a local orted to handle all local procs.
> The code that discovers local interfaces specifically ignores any interfaces
> that are not up or are just local loopbacks. My guess is that the person who
> wrote that code long, long ago was assuming that the sole purpose was to
> talk to remote nodes, not to loop back onto yourself.
>
> I imagine it could be changed to include loopback, but I would first need
> to work with other developers to ensure there are no unexpected consequences
> in doing so. Since no TCP interface is found, mpirun fails.
>
> In the 1.3 series, mpirun handles the local procs itself. Thus, this issue
> does not appear and things run just fine.
>
>
> Ralph
>
>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Thanks for the answer. Think i'll just update my firewall rules for now and
wait for a 1.3 release.


Re: [OMPI users] Open MPI and mpi-defaults

2009-01-06 Thread Adam C Powell IV
On Tue, 2009-01-06 at 12:25 -0600, Dirk Eddelbuettel wrote:
> I noticed that openmpi is now owner of a FTFBS against mpi-defaults because
> the latter wants the former which is missing on Alpha.

I'm sorry, I was supposed to let you know about this, as this openmpi
failure is keeping arpack++ out of Lenny.  The real problem is that
openmpi is FTBFS on alpha, see below.

> Can anybody dive in there and sort this out?

The openmpi buildd log on alpha [1] ends with:
[1] 
http://buildd.debian.org/fetch.cgi?pkg=openmpi;ver=1.2.8-3;arch=alpha;stamp=1225663211

/bin/sh ../../../libtool --tag=CXX   --mode=link g++  -DNDEBUG -g -O2 
-finline-functions -pthread  -export-dynamic   -o ompi_info components.o 
ompi_info.o output.o param.o version.o ../../../ompi/libmpi.la -lnsl  -lutil 
-lm 
libtool: link: g++ -DNDEBUG -g -O2 -finline-functions -pthread -o 
.libs/ompi_info components.o ompi_info.o output.o param.o version.o 
-Wl,--export-dynamic  ../../../ompi/.libs/libmpi.so /usr/lib/libibverbs.so 
-lpthread -lrt 
/build/buildd/openmpi-1.2.8/build/static/orte/.libs/libopen-rte.so 
/build/buildd/openmpi-1.2.8/build/static/opal/.libs/libopen-pal.so -ldl -lnsl 
-lutil -lm -pthread -Wl,-rpath -Wl,/usr/lib/openmpi/lib
../../../ompi/.libs/libmpi.so: undefined reference to 
`opal_sys_timer_get_cycles'
collect2: ld returned 1 exit status
make[3]: *** [ompi_info] Error 1
make[3]: Leaving directory 
`/build/buildd/openmpi-1.2.8/build/static/ompi/tools/ompi_info'

Earlier on we have:

make[3]: Entering directory 
`/build/buildd/openmpi-1.2.8/build/basic/ompi/mca/btl/openib'
...
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../../opal/include 
-I../../../../orte/include -I../../../../ompi/include 
-I../../../../../../ompi/mca/btl/openib -DPKGDATADIR=\"/usr/share/openmpi\" 
-I../../../../../.. -I../../../.. -I../../../../../../opal/include 
-I../../../../../../orte/include -I../../../../../../ompi/include -DNDEBUG 
-Wall -g -O2 -finline-functions -fno-strict-aliasing -pthread -MT 
btl_openib_component.lo -MD -MP -MF .deps/btl_openib_component.Tpo -c 
../../../../../../ompi/mca/btl/openib/btl_openib_component.c  -fPIC -DPIC -o 
.libs/btl_openib_component.o
../../../../../../ompi/mca/btl/openib/btl_openib_component.c: In function 
'btl_openib_component_init':
../../../../../../ompi/mca/btl/openib/btl_openib_component.c:666: warning: 
implicit declaration of function 'opal_sys_timer_get_cycles'

Okay, found it.  This function is inline assembly in timer.h, which
exists in opal/sys/amd64, ia32, ia64, powerpc and sparcv9 but not alpha,
mips, sparc or win32.  That said, timer.h in opal/sys has:

#ifndef OPAL_HAVE_SYS_TIMER_GET_CYCLES
#define OPAL_HAVE_SYS_TIMER_GET_CYCLES 0

which somehow is working on sparc (no reference to this function in the
buildd log) but not alpha.  (On mips, there are a bunch of assembler
errors of the form "opcode not supported on this processor".)

That's about what I have time for now.  Don't worry about mpi-defaults,
it's not trying to get into Lenny; but we should worry about OpenMPI not
building on alpha.  Does anyone on us...@open-mpi.org have any ideas?

-Adam
-- 
GPG fingerprint: D54D 1AEE B11C CE9B A02B  C5DD 526F 01E8 564E E4B6

Engineering consulting with open source tools
http://www.opennovation.com/


signature.asc
Description: This is a digitally signed message part


Re: [OMPI users] Relocating an Open MPI installation using OPAL_PREFIX

2009-01-06 Thread Ethan Mallove
On Tue, Jan/06/2009 10:33:31AM, Ethan Mallove wrote:
> On Mon, Jan/05/2009 10:14:30PM, Brian Barrett wrote:
> > Sorry I haven't jumped in this thread earlier -- I've been a bit behind.
> >
> > The multi-lib support worked at one time, and I can't think of why it would 
> > have changed.  The one condition is that libdir, includedir, etc. *MUST* be 
> > specified relative to $prefix for it to work.  It looks like you were 
> > defining them as absolute paths, so you'd have to set libdir directly, 
> > which will never work in multi-lib because mpirun and the app likely have 
> > different word sizes and therefore different libdirs.  
> >
> 
> I see. I'll try configuring with relative paths using ${prefix} and
> the like.
> 
> > More information is on the multilib page in the wiki:
> >
> >   https://svn.open-mpi.org/trac/ompi/wiki/MultiLib
> >
> 
> I removed this line from the MultiLib wiki page since Open MPI *is*
> now relocatable using the OPAL_PREFIX env vars:
> 
>   "Presently, Open MPI is not relocatable.  That is, Open MPI *must*
>   be installed and executed from which ever prefix was specified
>   during configure.  This is planned to change in the very near
>   future."
> 
> Thanks,
> Ethan
> 
> 
> > There is actually one condition we do not handle properly, the prefix flag 
> > to mpirun.  The LD_LIBRARY_PATH will only be set for the word size of 
> > mpirun, and not the executable.  Really, both would have to be added (so 
> > that both orted, which is likely always 32 bit in a multilib situation and 
> > the app both find their libraries).
> >
> > Brian
> >
> > On Jan 5, 2009, at 6:02 PM, Jeff Squyres wrote:
> >
> >> I honestly haven't thought through the ramifications of doing a multi-lib 
> >> build with OPAL_PREFIX et al. :-\
> >>
> >> If you setenv OPAL_LIBDIR, it'll use whatever you set it to, so it doesn't 
> >> matter what you configured --libdir with.  Additionally 
> >> mca/installdirs/config/install_dirs.h has this by default:
> >>
> >> #define OPAL_LIBDIR "${exec_prefix}/lib"
> >>
> >> Hence, if you use a default --libdir and setenv OPAL_PREFIX, then the 
> >> libdir should pick up the right thing (because it's based on the prefix).  
> >> But if you use --libdir that is *not* based on ${exec_prefix}, then you 
> >> might run into problems.
> >>
> >> Perhaps you can '--libdir="${exec_prefix}/lib64"' so that you can have 
> >> your custom libdir, but still have it dependent upon the prefix that gets 
> >> expanded at run time...?


Can the Open MPI configure setup handle ${exec_prefix} at the command
line? ${exec_prefix} seems to be getting eval'd to "NONE" in the
sub-configure's, and I get the following error:

  ...
  *** GNU libltdl setup
  configure: OMPI configuring in opal/libltdl
  configure: running /bin/bash './configure'  'CC=cc' 'CXX=CC' 'F77=f77' 
'FC=f90' '--without-threads' '--enable-heterogeneous' '--enable-cxx-exceptions' 
'--enable-shared' '--enable-orterun-prefix-by-default' '--with-sge' 
'--enable-mpi-f90' '--with-mpi-f90-size=small' '--disable-mpi-threads' 
'--disable-progress-threads' '--disable-debug' 'CFLAGS=-xtarget=ultra3 -m32 
-xarch=sparcvis2 -xprefetch -xprefetch_level=2 -xvector=lib -xdepend=yes 
-xbuiltin=%all -xO5' 'CXXFLAGS=-xtarget=ultra3 -m32 -xarch=sparcvis2 -xprefetch 
-xprefetch_level=2 -xvector=lib -xdepend=yes -xbuiltin=%all -xO5' 
'FFLAGS=-xtarget=ultra3 -m32 -xarch=sparcvis2 -xprefetch -xprefetch_level=2 
-xvector=lib -stackvar -xO5' 'FCFLAGS=-xtarget=ultra3 -m32 -xarch=sparcvis2 
-xprefetch -xprefetch_level=2 -xvector=lib -stackvar -xO5' 
'--prefix=/opt/SUNWhpc/HPC8.2/sun' '--libdir=NONE/lib' 
'--includedir=/opt/SUNWhpc/HPC8.2/sun/include' '--without-mx' 
'--with-tm=/ws/ompi-tools/orte/torque/current/shared-install32' 
'--with-contrib-vt-flags=--prefix=/opt/SUNWhpc/HPC8.2/sun --libdir='/lib' 
--includedir='/include' LDFLAGS=-R/opt/SUNWhpc/HPC8.2/sun/lib' 
'--with-package-string=ClusterTools 8.2' '--with-ident-string=@(#)RELEASE 
VERSION 1.3r20204-ct8.2-b01b-r10' --enable-ltdl-convenience 
--disable-ltdl-install --enable-shared --disable-static --cache-file=/dev/null 
--srcdir=. configure: WARNING: Unrecognized options: --without-threads, 
--enable-heterogeneous, --enable-cxx-exceptions, 
--enable-orterun-prefix-by-default, --with-sge, --enable-mpi-f90, 
--with-mpi-f90-size, --disable-mpi-threads, --disable-progress-threads, 
--disable-debug, --without-mx, --with-tm, --with-contrib-vt-flags, 
--with-package-string, --with-ident-string, --enable-ltdl-convenience
  configure: error: expected an absolute directory name for --libdir: NONE/lib
  configure: /bin/bash './configure' *failed* for opal/libltdl
  configure: error: Failed to build GNU libltdl.  This usually means that 
something
  is incorrectly setup with your environment.  There may be useful information 
in
  opal/libltdl/config.log.  You can also disable GNU libltdl (which will disable
  dynamic shared object loading) by configuring with --disable-dlopen.

It appears the sub-configure n