Re: [OMPI users] Configure fails with icc 10.1.008

2007-12-10 Thread David Gunter
A quick reading of this thread makes it sound to me as if you are  
using icc to compile c++ code.  The correct compiler to use is icpc.   
This has been the case since at least the version 9 release of the  
Intel compilers.  icc will not compile c++ code.


Hope this is useful.

-david
--
David Gunter
HPC-3: Parallel Tools Team
Los Alamos National Laboratory

On Dec 6, 2007, at 9:25 PM, Eric Thibodeau wrote:


Hello all,

  I am unable to get past ./configure as ICC fails on C++ tests (see  
attached ompi-output.tar.gz). Configure was called without and the  
with sourcing `/opt/intel/cc/10.1.xxx/bin/iccvars.sh`  as per one of  
the invocation options in icc's doc. I was unable to find the  
relevant (well..intelligible for me that is ;P ) cause of the  
failure in config.log. Any help would be appreciated.


Thanks,

Eric Thibodeau
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users








Re: [OMPI users] Open MPI 1.2.4 verbosity w.r.t. osc pt2pt

2007-12-10 Thread Jeff Squyres

On Oct 16, 2007, at 11:20 AM, Brian Granger wrote:


Wow, that is quite a study of the different options.  I will spend
some time looking over things to better understand the (complex)
situation.  I will also talk with Lisandro Dalcin about what he thinks
the best approach is for mpi4py.


Brian / Lisandro --

I don't think that I heard back from you on this issue.  Would you  
have major heartburn if I remove all linking of our components against  
libmpi (etc.)?


(for a nicely-formatted refresher of the issues, check out 
https://svn.open-mpi.org/trac/ompi/wiki/Linkers)

Thanks.



One question though.  You said that
nothing had changed in this respect from 1.2.3 to 1.2.4, but 1.2.3
doesn't show the problem.  Does this make sense?

Brian

On 10/16/07, Jeff Squyres  wrote:

On Oct 12, 2007, at 3:5 PM, Brian Granger wrote:

My guess is that Rmpi is dynamically loading libmpi.so, but not
specifying the RTLD_GLOBAL flag. This means that libmpi.so is not
available to the components the way it should be, and all goes
downhill from there. It only mostly works because we do something
silly with how we link most of our components, and Linux is just
smart enough to cover our rears (thankfully).


In mpi4py, libmpi.so is linked in at compile time, not loaded using
dlopen. Granted, the resulting mpi4py binary is loaded into python
using dlopen.


I believe that means that libmpi.so will be loaded as an indirect
dependency of mpi4py.  See the table below.


The pt2pt component (rightly) does not have a -lmpi in its link
line. The other components that use symbols in libmpi.so (wrongly)
do have a -lmpi in their link line. This can cause some problems on
some platforms (Linux tends to do dynamic linking / dynamic loading
better than most). That's why only the pt2pt component fails.


Did this change from 1.2.3 to 1.2.4?


No:

% diff openmpi-1.2.3/ompi/mca/osc/pt2pt/Makefile.am openmpi-1.2.4/
ompi/mca/osc/pt2pt/Makefile.am
%


Solutions:

- Someone could make the pt2pt osc component link in libmpi.so
like the rest of the components and hope that no one ever
tries this on a non-friendly platform.


Shouldn't the openmpi build system be able to figure this stuff  
out on

a per platform basis?


I believe that this would not be useful -- see the tables and
conclusions below.


- Debian (and all Rmpi users) could configure Open MPI with the



--disable-dlopen flag and ignore the problem.


Are there disadvantages to this approach?


You won't be able to add more OMPI components to your existing
installation (e.g., 3rd party components).  But that's probably ok,
at least for now -- not many people are distributing 3rd party OMPI
components.


- Someone could fix Rmpi to dlopen libmpi.so with the RTLD_GLOBAL
flag and fix the problem properly.


Again, my main problem with this solution is that it means I must  
both

link to libmpi at compile time and load it dynamically using dlopen.
This doesn't seem right. Also, it makes it impossible on OS X to
avoid setting LD_LIBRARY_PATH (OS X doesn't have rpath). Being able
to use openmpi without setting LD_LIBRARY_PATH is important.


This is a very complex issue.  Here's the possibilities that I see...
(prepare for confusion!)

= 
= 
= 
=

==

This first table represents what happens in the following scenarios:

- compile an application against Open MPI's libmpi, or
- compile an "application" DSO that is dlopen'ed with RTLD_GLOBAL, or
- explicitly dlopen Open MPI's libmpi with RTLD_GLOBAL

OMPI DSO
 libmpiOMPI DSO components
App linked   includes  components   depend on
against  components?   available?   libmpi.so?   Result
--   ---   --   --   --
1.  libmpi.sono   noNA   won't run
2.  libmpi.sono   yes   no   yes
3.  libmpi.sono   yes   yes  yes (*1*)
4.  libmpi.soyes  noNA   yes
5.  libmpi.soyes  yes   no   maybe (*2*)
6.  libmpi.soyes  yes   yes  maybe (*3*)
--    --     --
7.  libmpi.a no   noNA   won't run
8.  libmpi.a no   yes   no   yes (*4*)
9.  libmpi.a no   yes   yes  no (*5*)
10. libmpi.a yes  noNA   yes
11. libmpi.a yes  yes   no   maybe (*6*)
12. libmpi.a yes  yes   yes  no (*7*)
--    --     

All libmpi.a scenarios assume that libmpi.so is also available.

In the OMPI v1.2 series, most components link against libmpi.so, but
some do not (it's our mistake for not being uniform).

(*1*) As far as we know, this works on al

[OMPI users] Question about issue with use of multiple IB ports

2007-12-10 Thread Craig Tierney

I just built OpenMPI-1.2.4 to work on my system (IB, OFED-1.2).
When I run a job, I am getting the following message:

  WARNING: There are more than one active ports on host 'w74', but the
  default subnet GID prefix was detected on more than one of these
  ports.  If these ports are connected to different physical IB
  networks, this configuration will fail in Open MPI.  This version of
  Open MPI requires that every physically separate IB subnet that is
  used between connected MPI processes must have different subnet ID
  values.

I went to the faq to read about the message.  My code does complete
successfully because both nodes are connected by both meshes.

My question is, how can I tell mpirun that I only want to use of
of the ports?  I specifically want to use either port 1 or port 2, but
not bond both together.

Can this be done?

Thanks,
Craig


--
Craig Tierney (craig.tier...@noaa.gov)


Re: [OMPI users] Open MPI 1.2.4 verbosity w.r.t. osc pt2pt

2007-12-10 Thread Brian Granger
I don't think this will be a problem.  We are now setting the flags
correctly and doing a dlopen, which should enable the components to
find everything in libmpi.so.  If I remember correctly this new change
would simply make all components compiled in a consistent way.

I will run this by Lisandro and see what he thinks though.  If you
don't hear back from us within a day, assume everything is fine.

Brian

On Dec 10, 2007 10:13 AM, Jeff Squyres  wrote:
> On Oct 16, 2007, at 11:20 AM, Brian Granger wrote:
>
> > Wow, that is quite a study of the different options.  I will spend
> > some time looking over things to better understand the (complex)
> > situation.  I will also talk with Lisandro Dalcin about what he thinks
> > the best approach is for mpi4py.
>
> Brian / Lisandro --
>
> I don't think that I heard back from you on this issue.  Would you
> have major heartburn if I remove all linking of our components against
> libmpi (etc.)?
>
> (for a nicely-formatted refresher of the issues, check out 
> https://svn.open-mpi.org/trac/ompi/wiki/Linkers)
>
> Thanks.
>
>
>
> > One question though.  You said that
> > nothing had changed in this respect from 1.2.3 to 1.2.4, but 1.2.3
> > doesn't show the problem.  Does this make sense?
> >
> > Brian
> >
> > On 10/16/07, Jeff Squyres  wrote:
> >> On Oct 12, 2007, at 3:5 PM, Brian Granger wrote:
>  My guess is that Rmpi is dynamically loading libmpi.so, but not
>  specifying the RTLD_GLOBAL flag. This means that libmpi.so is not
>  available to the components the way it should be, and all goes
>  downhill from there. It only mostly works because we do something
>  silly with how we link most of our components, and Linux is just
>  smart enough to cover our rears (thankfully).
> >>>
> >>> In mpi4py, libmpi.so is linked in at compile time, not loaded using
> >>> dlopen. Granted, the resulting mpi4py binary is loaded into python
> >>> using dlopen.
> >>
> >> I believe that means that libmpi.so will be loaded as an indirect
> >> dependency of mpi4py.  See the table below.
> >>
>  The pt2pt component (rightly) does not have a -lmpi in its link
>  line. The other components that use symbols in libmpi.so (wrongly)
>  do have a -lmpi in their link line. This can cause some problems on
>  some platforms (Linux tends to do dynamic linking / dynamic loading
>  better than most). That's why only the pt2pt component fails.
> >>>
> >>> Did this change from 1.2.3 to 1.2.4?
> >>
> >> No:
> >>
> >> % diff openmpi-1.2.3/ompi/mca/osc/pt2pt/Makefile.am openmpi-1.2.4/
> >> ompi/mca/osc/pt2pt/Makefile.am
> >> %
> >>
>  Solutions:
> 
>  - Someone could make the pt2pt osc component link in libmpi.so
>  like the rest of the components and hope that no one ever
>  tries this on a non-friendly platform.
> >>>
> >>> Shouldn't the openmpi build system be able to figure this stuff
> >>> out on
> >>> a per platform basis?
> >>
> >> I believe that this would not be useful -- see the tables and
> >> conclusions below.
> >>
>  - Debian (and all Rmpi users) could configure Open MPI with the
> >>>
>  --disable-dlopen flag and ignore the problem.
> >>>
> >>> Are there disadvantages to this approach?
> >>
> >> You won't be able to add more OMPI components to your existing
> >> installation (e.g., 3rd party components).  But that's probably ok,
> >> at least for now -- not many people are distributing 3rd party OMPI
> >> components.
> >>
>  - Someone could fix Rmpi to dlopen libmpi.so with the RTLD_GLOBAL
>  flag and fix the problem properly.
> >>>
> >>> Again, my main problem with this solution is that it means I must
> >>> both
> >>> link to libmpi at compile time and load it dynamically using dlopen.
> >>> This doesn't seem right. Also, it makes it impossible on OS X to
> >>> avoid setting LD_LIBRARY_PATH (OS X doesn't have rpath). Being able
> >>> to use openmpi without setting LD_LIBRARY_PATH is important.
> >>
> >> This is a very complex issue.  Here's the possibilities that I see...
> >> (prepare for confusion!)
> >>
> >> =
> >> =
> >> =
> >> =
> >> ==
> >>
> >> This first table represents what happens in the following scenarios:
> >>
> >> - compile an application against Open MPI's libmpi, or
> >> - compile an "application" DSO that is dlopen'ed with RTLD_GLOBAL, or
> >> - explicitly dlopen Open MPI's libmpi with RTLD_GLOBAL
> >>
> >> OMPI DSO
> >>  libmpiOMPI DSO components
> >> App linked   includes  components   depend on
> >> against  components?   available?   libmpi.so?   Result
> >> --   ---   --   --   --
> >> 1.  libmpi.sono   noNA   won't run
> >> 2.  libmpi.sono   yes   no   yes
> >> 3.  libmpi.sono   yes   yes  yes (*1*)
> >

Re: [OMPI users] Question about issue with use of multiple IB ports

2007-12-10 Thread Jeff Squyres

On Dec 10, 2007, at 3:06 PM, Craig Tierney wrote:


I just built OpenMPI-1.2.4 to work on my system (IB, OFED-1.2).
When I run a job, I am getting the following message:

  WARNING: There are more than one active ports on host 'w74', but the
  default subnet GID prefix was detected on more than one of these
  ports.  If these ports are connected to different physical IB
  networks, this configuration will fail in Open MPI.  This version of
  Open MPI requires that every physically separate IB subnet that is
  used between connected MPI processes must have different subnet ID
  values.

I went to the faq to read about the message.  My code does complete
successfully because both nodes are connected by both meshes.


You can also assign a different subnet ID to each of the two fabrics.   
OMPI will therefore be able to tell these two networks apart and you  
won't get this warning message.


We only treat the default subnet ID specially because most people  
don't change it, and if they have multiple fabrics, they could run  
into problems because OMPI won't be able to tell them apart.



My question is, how can I tell mpirun that I only want to use of
of the ports?  I specifically want to use either port 1 or port 2, but
not bond both together.


The OMPI v1.2 series has fairly lame controls for this - you can limit  
how many IB ports an MPI process will use on each machine (via the  
btl_openib_max_btls MCA parameter), but not which ones.  OMPI will use  
the first btl_openib_max_btls ports (the default is infinite).


In OMPI v1.3, there are specific MCA parameters for controlling  
exactly which NICs and/or ports you want to use or not use.   
Specifically:


- btl_openib_if_include: a comma-delimited list of interface names and/ 
or ports to use
- btl_openib_if_exclude: a comma-delimited list of interface names and/ 
or ports to exclude (i.e., use all others)


For example:

  mpirun --mca btl_openib_if_include mthca0,mthca1:1 ...

Meaning "use all ports on mthca0" and "use port 1 on mthca1".

--
Jeff Squyres
Cisco Systems


Re: [OMPI users] Open MPI 1.2.4 verbosity w.r.t. osc pt2pt

2007-12-10 Thread Jeff Squyres
Ok.  I was planning to do this for OMPI v1.3 and above; not really  
planning to do this for the OMPI v1.2 series.  We don't have an exact  
timeframe for OMPI v1.3 yet -- best guesses at this point is that  
it'll be somewhere in 1HCY08.



On Dec 10, 2007, at 5:03 PM, Brian Granger wrote:


I don't think this will be a problem.  We are now setting the flags
correctly and doing a dlopen, which should enable the components to
find everything in libmpi.so.  If I remember correctly this new change
would simply make all components compiled in a consistent way.

I will run this by Lisandro and see what he thinks though.  If you
don't hear back from us within a day, assume everything is fine.

Brian

On Dec 10, 2007 10:13 AM, Jeff Squyres  wrote:

On Oct 16, 2007, at 11:20 AM, Brian Granger wrote:


Wow, that is quite a study of the different options.  I will spend
some time looking over things to better understand the (complex)
situation.  I will also talk with Lisandro Dalcin about what he  
thinks

the best approach is for mpi4py.


Brian / Lisandro --

I don't think that I heard back from you on this issue.  Would you
have major heartburn if I remove all linking of our components  
against

libmpi (etc.)?

(for a nicely-formatted refresher of the issues, check out 
https://svn.open-mpi.org/trac/ompi/wiki/Linkers)

Thanks.




One question though.  You said that
nothing had changed in this respect from 1.2.3 to 1.2.4, but 1.2.3
doesn't show the problem.  Does this make sense?

Brian

On 10/16/07, Jeff Squyres  wrote:

On Oct 12, 2007, at 3:5 PM, Brian Granger wrote:

My guess is that Rmpi is dynamically loading libmpi.so, but not
specifying the RTLD_GLOBAL flag. This means that libmpi.so is not
available to the components the way it should be, and all goes
downhill from there. It only mostly works because we do something
silly with how we link most of our components, and Linux is just
smart enough to cover our rears (thankfully).


In mpi4py, libmpi.so is linked in at compile time, not loaded  
using

dlopen. Granted, the resulting mpi4py binary is loaded into python
using dlopen.


I believe that means that libmpi.so will be loaded as an indirect
dependency of mpi4py.  See the table below.


The pt2pt component (rightly) does not have a -lmpi in its link
line. The other components that use symbols in libmpi.so  
(wrongly)
do have a -lmpi in their link line. This can cause some  
problems on
some platforms (Linux tends to do dynamic linking / dynamic  
loading

better than most). That's why only the pt2pt component fails.


Did this change from 1.2.3 to 1.2.4?


No:

% diff openmpi-1.2.3/ompi/mca/osc/pt2pt/Makefile.am openmpi-1.2.4/
ompi/mca/osc/pt2pt/Makefile.am
%


Solutions:

- Someone could make the pt2pt osc component link in libmpi.so
like the rest of the components and hope that no one ever
tries this on a non-friendly platform.


Shouldn't the openmpi build system be able to figure this stuff
out on
a per platform basis?


I believe that this would not be useful -- see the tables and
conclusions below.


- Debian (and all Rmpi users) could configure Open MPI with the



--disable-dlopen flag and ignore the problem.


Are there disadvantages to this approach?


You won't be able to add more OMPI components to your existing
installation (e.g., 3rd party components).  But that's probably ok,
at least for now -- not many people are distributing 3rd party OMPI
components.


- Someone could fix Rmpi to dlopen libmpi.so with the RTLD_GLOBAL
flag and fix the problem properly.


Again, my main problem with this solution is that it means I must
both
link to libmpi at compile time and load it dynamically using  
dlopen.

This doesn't seem right. Also, it makes it impossible on OS X to
avoid setting LD_LIBRARY_PATH (OS X doesn't have rpath). Being  
able

to use openmpi without setting LD_LIBRARY_PATH is important.


This is a very complex issue.  Here's the possibilities that I  
see...

(prepare for confusion!)

=
=
=
= 
= 
===

==

This first table represents what happens in the following  
scenarios:


- compile an application against Open MPI's libmpi, or
- compile an "application" DSO that is dlopen'ed with  
RTLD_GLOBAL, or

- explicitly dlopen Open MPI's libmpi with RTLD_GLOBAL

   OMPI DSO
libmpiOMPI DSO components
   App linked   includes  components   depend on
   against  components?   available?   libmpi.so?   Result
   --   ---   --   --   --
1.  libmpi.sono   noNA   won't run
2.  libmpi.sono   yes   no   yes
3.  libmpi.sono   yes   yes  yes (*1*)
4.  libmpi.soyes  noNA   yes
5.  libmpi.soyes  yes   no   maybe  
(*2*)
6.  libmpi.soyes  yes   yes  ma