Re: [OMPI users] Unable to compile OpenMPI 1.10.3 with CUDA

2016-10-28 Thread Craig tierney
Sylvain,

If I do not set --with-cuda, I get:

configure:9964: result: no
configure:10023: checking whether CU_POINTER_ATTRIBUTE_SYNC_MEMOPS is
declared
configure:10023: gcc -c -DNDEBUG   conftest.c >&5
conftest.c:83:19: fatal error: /cuda.h: No such file or directory
 #include 
   ^

If I specify the path to cuda, the same results as before.  In the
configure process, the first time cuda.h is tested it works.

configure:9843: checking if --with-cuda is set
configure:9897: result: found (/usr/local/cuda/include/cuda.h)
configure:9964: checking for struct CUipcMemHandle_st.reserved

But the next time the compile command doesn't add an include to the compile
line and the compile fails:

configure:74312: checking for CL/cl_ext.h
configure:74312: result: no
configure:74425: checking cuda.h usability
configure:74425: gcc -std=gnu99 -c -O3 -DNDEBUGconftest.c >&5
conftest.c:648:18: fatal error: cuda.h: No such file or directory
 #include 
  ^
compilation terminated.
configure:74425: $? = 1

Craig


On Thu, Oct 27, 2016 at 4:47 PM, Sylvain Jeaugey 
wrote:

> I guess --with-cuda is disabling the default CUDA path which is
> /usr/local/cuda. So you should either not set --with-cuda or set
> --with-cuda $CUDA_HOME (no include).
>
> Sylvain
> On 10/27/2016 03:23 PM, Craig tierney wrote:
>
> Hello,
>
> I am trying to build OpenMPI 1.10.3 with CUDA but I am unable to build the
> library that will allow me to use IPC on a node or GDR between nodes.   I
> have tried with with 1.10.4 and 2.0.1 and have the same problems.  Here is
> my build script:
>
> ---
> #!/bin/bash
>
> export OPENMPI_VERSION=1.10.3
> export BASEDIR=/tmp/mpi_testing/
> export CUDA_HOME=/usr/local/cuda
> export PATH=$CUDA_HOME/bin/:$PATH
> export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
> export MPI_HOME=$BASEDIR/openmpi-$OPENMPI_VERSION
>
> which nvcc
> nvcc --version
>
> tar -zxf openmpi-$OPENMPI_VERSION.tar.gz
> cd openmpi-$OPENMPI_VERSION
>
> ./configure --prefix=$MPI_HOME --with-cuda=$CUDA_HOME/include > config.out
> 2>&1
>
> make -j > build.out 2>&1
> make install >> build.out 2>&1
> ---
>
> From the docs, it appears that I should not have to set anything but
> --with-cuda since my CUDA is in /usr/local/cuda.  However, I appended
> /usr/local/cuda/include just in case when the first way didn't work.
>
> From the output in config.log, I see that cuda.h is not found.  When the
> tests are called there is no extra include flag added to specify the
> /usr/local/cuda/include path.
>
> With the resulting build, I test for CUDA and GDR with ompi_info.  Results
> are:
>
> testuser@dgx-1:~/temp$ /tmp/mpi_testing/openmpi-1.10.3/bin/ompi_info  |
> grep cuda
>  MCA btl: smcuda (MCA v2.0.0, API v2.0.0, Component
> v1.10.3)
> MCA coll: cuda (MCA v2.0.0, API v2.0.0, Component v1.10.3)
> testuser@dgx-1:~/temp$ /tmp/mpi_testing/openmpi-1.10.3/bin/ompi_info  |
> grep gdr
> testuser@dgx-1:~/temp$
>
> Configure and build logs are attached.
>
>
> Thanks,
> Craig
>
>
>
> ___
> users mailing 
> listus...@lists.open-mpi.orghttps://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
> --
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential information.  Any unauthorized review, use,
> disclosure or distribution is prohibited.  If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> --
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] mpif90 wrapper is using -pthread as option to ifort, but option is depreciated

2012-11-09 Thread Craig Tierney
I just built OpenMPI 1.6.3 with ifort 12.1.4.  When running ifort I am
getting the warning:

ifort: command line remark #10010: open '-pthread' is depreciated and
will be removed in a future release.  See '-help deprecated'.

Is -pthread really needed?  Is there a configure option to change this
or should have intel not changed from how other compilers work?

Thanks,
Craig


Re: [OMPI users] mpif90 wrapper is using -pthread as option to ifort, but option is depreciated

2012-11-12 Thread Craig Tierney
Thanks all for the suggestions.  It was more of the annoyance than
anything.  I will get them removed.  It sounds like someday the
libtool developers will (hopefully) account for the change and I won't
have to make local mods to the source code.

Thanks,
Craig

On Fri, Nov 9, 2012 at 11:45 AM, Martin Siegert  wrote:
> On Fri, Nov 09, 2012 at 11:05:23AM -0700, Craig Tierney wrote:
>> I just built OpenMPI 1.6.3 with ifort 12.1.4.  When running ifort I am
>> getting the warning:
>>
>> ifort: command line remark #10010: open '-pthread' is depreciated and
>> will be removed in a future release.  See '-help deprecated'.
>
> If you just want to get rid of the warning, you can edit the settings
> for the wrapper compilers in share/openmpi/mpif90-wrapper-data.txt, etc.
>
> gcc -dumpspecs | grep pthread
>
> shows that -pthread adds the preprocessor flag -D_REENTRANT and the linker
> flag -lpthread. I.e., removing -pthread from the
>
> compiler_flags=...
>
> line and adding -D_REENTRANT to the
>
> preprocessor_flags=...
>
> line and -lpthread to the
>
> libs=...
>
> line should do the job and should be completely equivalent.
>
> As far as Intel compilers are concerned -pthread can be replaced with
> "-reentrancy threaded", but that does not work when the underlying compiler
> is changed, e.g., OMPI_FC=gfortran.
>
> Cheers,
> Martin
>
> --
> Martin Siegert
> Simon Fraser University
> Burnaby, British Columbia
> Canada
>
>> Is -pthread really needed?  Is there a configure option to change this
>> or should have intel not changed from how other compilers work?
>>
>> Thanks,
>> Craig
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


[OMPI users] Question about issue with use of multiple IB ports

2007-12-10 Thread Craig Tierney

I just built OpenMPI-1.2.4 to work on my system (IB, OFED-1.2).
When I run a job, I am getting the following message:

  WARNING: There are more than one active ports on host 'w74', but the
  default subnet GID prefix was detected on more than one of these
  ports.  If these ports are connected to different physical IB
  networks, this configuration will fail in Open MPI.  This version of
  Open MPI requires that every physically separate IB subnet that is
  used between connected MPI processes must have different subnet ID
  values.

I went to the faq to read about the message.  My code does complete
successfully because both nodes are connected by both meshes.

My question is, how can I tell mpirun that I only want to use of
of the ports?  I specifically want to use either port 1 or port 2, but
not bond both together.

Can this be done?

Thanks,
Craig


--
Craig Tierney (craig.tier...@noaa.gov)


[OMPI users] Problems building openmpi 1.2.6 with Lahey Fortran

2008-08-01 Thread Craig Tierney

I am trying to build OpenMPI-1.2.6 with Lahey Fortran,
and I am running into problems.  The issue is buliding
shared libraries with Lahey.  Lahey can do it, but they
don't use the construct -fPIC to specify it.  They
use --shared.

If I don't include anything, then the build fails near
the end when linking with a Fortran object with an
error message like:


/usr/bin/ld: testcode.o: relocation R_X86_64_32S against `a local
symbol' can not be used when making a shared object; recompile with
-fPIC

If I add --shared to FCFLAGS, the configure process will not finish
(configure line below).  It crashes because when it tries to build
a small test program, it will seg fault.

# ./configure FCFLAGS=--shared CC=gcc CXX=g++ F77=lf95 FC=lf95 F90=lf95 --prefix=/opt/openmpi/1.2.6-lahey-8.00a --without-gridengine 
--enable-io-romio --with-io-romio-flags=--with-file-sys=nfs+ufs --with-openib=/opt/hjet/ofed/1.3.1


Relevant config.log output:

configure:36725: checking if Fortran compiler works
configure:36781: lf95 -o conftest --shared   conftest.f  >&5
Encountered 0 errors, 0 warnings in file conftest.f.
configure:36784: $? = 0
configure:36790: ./conftest
./configure: line 36791: 29048 Segmentation fault  ./conftest$ac_exeext
configure:36793: $? = 139
configure: program exited with status 139
configure: failed program was:
|   program main
|
|   end


So my hack to fix this was to add --shared to the
FCFLAGS in ompi/mpi/f90/Makefile and build the
code.

What is the correct way for the configure process
to know that if the compiler is lf95, to use
--shared when compiling objects?

Thanks,
Craig



--
Craig Tierney (craig.tier...@noaa.gov)


Re: [OMPI users] Fwd: Problems building openmpi 1.2.6 with Lahey Fortran

2008-08-07 Thread Craig Tierney

Jeff Squyres wrote:
Sorry I dropped attention on this thread; Ralph posted a reply earlier 
but it got rejected because he's not a member of the list.  Here's his 
reply.




I will coordinate with Ralf and try to add Lahey support to Libtool.

Craig




Begin forwarded message:


From: Ralf Wildenhues 
Date: August 4, 2008 2:53:49 PM EDT
To: Jeff Squyres 
Cc: Open MPI Users 
Subject: Re: [OMPI users] Problems building openmpi 1.2.6 with Lahey 
Fortran


Hello Craig, Jeff,

* Jeff Squyres wrote on Sun, Aug 03, 2008 at 03:20:17PM CEST:

Open MPI uses GNU Libtool to build itself.  I suspect that perhaps
Libtool doesn't know the Right Mojo to understand the Lahey compilers,
and that's why you're seeing this issue.  As such, it might well be that
your workaround is the best one.

Ralf -- we build the OMPI 1.2 series with that same "late beta" Libtool
(2.1a) that we have forever.  Do you recall offhand if Libtool 2.x 
before

2.2 supported the Lahey fortran compilers?


Libtool does not yet support Lahey.  Neither Absoft Fortran 90
(which was asked about a while ago).

If you would like to see support for Lahey and Absoft in Libtool,
here's what you can do that really helps getting there faster:

- get me some access to these compilers.  A login to a system
 with one of them would be great, but a long-term trial version
 (2 weeks helps little for later regression testing) would be
 better than nothing, too (sometimes a friendly email is all it
 takes for this);

- alternatively, a volunteer that has access to the compilers,
 to help me with the port, or do the porting herself.  This will
 require installing git Libtool and running its testsuite anywhere
 between once and several times, and reading and sending some emails
 with patches resp. test results.


Otherwise, here's some tips for workaround building: edit the generated
libtool scripts (there are a few in the OpenMPI build tree) and set
pic_flag (to --shared), archive_cmds (to contain --shared), and
archive_export_cmds correctly everywhere.  These variables are set once
for each compiler: the C compiler comes at the beginning, all other ones
near the end of the script.

Cheers,
Ralf






--
Craig Tierney (craig.tier...@noaa.gov)


[OMPI users] Passing LD_LIBRARY_PATH to orted

2008-10-10 Thread Craig Tierney

I am having problems launching openmpi jobs on my system.  I support multiple 
versions
of MPI and compilers using GNU Modules.  For the default compiler, everything 
is fine.
For non-default, I am having problems.

I built Openmpi-1.2.6 (and 1.2.7) with the following configure options:

# module load intel/10.1
# ./configure CC=icc CXX=icpc F77=ifort FC=ifort F90=ifort 
--prefix=/opt/openmpi/1.2.7-intel-10.1 --without-
gridengine --enable-io-romio --with-io-romio-flags=--with-file-sys=nfs+ufs 
--with-openib=/opt/hjet/ofed/1.3.1

When I launch a job, I run the module command for the right compiler/MPI 
version to set the paths
correctly.  Mpirun passes LD_LIBRARY_PATH to the executable I am launching, but 
not orted.

When orted is launched on the remote system, the LD_LIBRARY_PATH
doesn't come with, and the Intel 10.1 libraries can't be found.

/opt/openmpi/1.2.7-intel-10.1/bin/orted: error while loading shared libraries: libintlc.so.5: cannot open shared object file: No 
such file or directory


How do others solve this problem?

Thanks,
Craig


--
Craig Tierney (craig.tier...@noaa.gov)


Re: [OMPI users] Passing LD_LIBRARY_PATH to orted

2008-10-14 Thread Craig Tierney

Gus Correa wrote:

Hi Craig, George, list

Here is a quick and dirty solution I used before for a similar problem.
Link the Intel libraries statically,  using the "-static-intel" flag.
Other shared libraries continue to be dynamically linked.

For instance:

mipf90 -static-intel my_mpi_program.f90

What is not clear to me is why to use orted instead of 
mpirun/mpiexec/orterun,

which has a mechanism to pass environment variables to the hosts
with "-x LD_LIBRARY_PATH=/my/intel/lib".

I hope this helps.
Gus Correa



I am not calling orted directly.  I am using mpirun.  Mpirun launches
orted to each node.  Orted will pass the LD_LIBRARY_PATH to the specified
application.  Mpirun does not pass LD_LIBRARY_PATH to Orted, so it doesn't
launch.

Craig

--
Craig Tierney (craig.tier...@noaa.gov)


Re: [OMPI users] Passing LD_LIBRARY_PATH to orted

2008-10-14 Thread Craig Tierney

George Bosilca wrote:

Craig,

This is a problem with the Intel libraries and not the Open MPI ones. 
You have to somehow make these libraries available on the compute nodes.


What I usually do (but it's not the best way to solve this problem) is 
to copy these libraries somewhere on my home area and to add the 
directory to my LD_LIBRARY_PATH.


  george.



This is ok when you only ever use one compiler, but it isn't very flexible.
I want to keep it as simple as possible for my users, while having a 
maintainable
system.

The libraries are on the compute nodes, the problem deals with supporting
multiple versions of compilers.  I can't just list all of the lib paths
in ld.so.conf, because then the user will never get the correct one.  I can't
specify a static LD_LIBRARY_PATH for the same reason.  I would prefer not
to build my system libraries static.

To the OpenMPI developers, what is your opinion on changing orterun/mpirun
to pass LD_LIBRARY_PATH to the remote hosts when starting OpenMPI processes?
By hand, all that would be done is:

env LD_LIBRARY_PATH=$LD_LIBRARY_PATH $OPMIPATH/orted 

This would ensure that orted is launched correctly.

Or is it better to just build the OpenMPI tools statically?  We also
use other compilers (PGI, Lahey) so I need a solution that works for
all of them.

Thanks,
Craig




On Oct 10, 2008, at 6:17 PM, Craig Tierney wrote:

I am having problems launching openmpi jobs on my system.  I support 
multiple versions
of MPI and compilers using GNU Modules.  For the default compiler, 
everything is fine.

For non-default, I am having problems.

I built Openmpi-1.2.6 (and 1.2.7) with the following configure options:

# module load intel/10.1
# ./configure CC=icc CXX=icpc F77=ifort FC=ifort F90=ifort 
--prefix=/opt/openmpi/1.2.7-intel-10.1 --without-
gridengine --enable-io-romio 
--with-io-romio-flags=--with-file-sys=nfs+ufs 
--with-openib=/opt/hjet/ofed/1.3.1


When I launch a job, I run the module command for the right 
compiler/MPI version to set the paths
correctly.  Mpirun passes LD_LIBRARY_PATH to the executable I am 
launching, but not orted.


When orted is launched on the remote system, the LD_LIBRARY_PATH
doesn't come with, and the Intel 10.1 libraries can't be found.

/opt/openmpi/1.2.7-intel-10.1/bin/orted: error while loading shared 
libraries: libintlc.so.5: cannot open shared object file: No such file 
or directory


How do others solve this problem?

Thanks,
Craig


--
Craig Tierney (craig.tier...@noaa.gov)
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Craig Tierney (craig.tier...@noaa.gov)


Re: [OMPI users] Passing LD_LIBRARY_PATH to orted

2008-10-14 Thread Craig Tierney

George Bosilca wrote:
The option to expand the remote LD_LIBRARY_PATH, in such a way that Open 
MPI related applications have their dependencies satisfied, is in the 
trunk. The fact that the compiler requires some LD_LIBRARY_PATH is out 
of the scope of an MPI implementation, and I don't think we should take 
care of it.


Passing the local LD_LIBRARY_PATH to the remote nodes doesn't make much 
sense. There are plenty of environment, where the head node have a 
different configuration than the compute nodes. Again, in this case my 
original solution seems not that bad. If you copy (or make a link if you 
prefer) in the Open MPI lib directory to the compiler shared libraries, 
this will work.


  george.



This does work.  It just increases maintenance for each new version
of OpenMPI.   How often does a head node have a different configuration
than the compute node?  It would see that this would even more support the
passing of LD_LIBRARY_PATH for OpenMPI tools to support a heterogeneous
configuration as you described.


Thanks,
Craig





On Oct 14, 2008, at 12:11 PM, Craig Tierney wrote:


George Bosilca wrote:

Craig,
This is a problem with the Intel libraries and not the Open MPI ones. 
You have to somehow make these libraries available on the compute nodes.
What I usually do (but it's not the best way to solve this problem) 
is to copy these libraries somewhere on my home area and to add the 
directory to my LD_LIBRARY_PATH.

 george.


This is ok when you only ever use one compiler, but it isn't very 
flexible.
I want to keep it as simple as possible for my users, while having a 
maintainable

system.

The libraries are on the compute nodes, the problem deals with supporting
multiple versions of compilers.  I can't just list all of the lib paths
in ld.so.conf, because then the user will never get the correct one.  
I can't

specify a static LD_LIBRARY_PATH for the same reason.  I would prefer not
to build my system libraries static.

To the OpenMPI developers, what is your opinion on changing 
orterun/mpirun
to pass LD_LIBRARY_PATH to the remote hosts when starting OpenMPI 
processes?

By hand, all that would be done is:

env LD_LIBRARY_PATH=$LD_LIBRARY_PATH $OPMIPATH/orted 

This would ensure that orted is launched correctly.

Or is it better to just build the OpenMPI tools statically?  We also
use other compilers (PGI, Lahey) so I need a solution that works for
all of them.

Thanks,
Craig




On Oct 10, 2008, at 6:17 PM, Craig Tierney wrote:
I am having problems launching openmpi jobs on my system.  I support 
multiple versions
of MPI and compilers using GNU Modules.  For the default compiler, 
everything is fine.

For non-default, I am having problems.

I built Openmpi-1.2.6 (and 1.2.7) with the following configure options:

# module load intel/10.1
# ./configure CC=icc CXX=icpc F77=ifort FC=ifort F90=ifort 
--prefix=/opt/openmpi/1.2.7-intel-10.1 --without-
gridengine --enable-io-romio 
--with-io-romio-flags=--with-file-sys=nfs+ufs 
--with-openib=/opt/hjet/ofed/1.3.1


When I launch a job, I run the module command for the right 
compiler/MPI version to set the paths
correctly.  Mpirun passes LD_LIBRARY_PATH to the executable I am 
launching, but not orted.


When orted is launched on the remote system, the LD_LIBRARY_PATH
doesn't come with, and the Intel 10.1 libraries can't be found.

/opt/openmpi/1.2.7-intel-10.1/bin/orted: error while loading shared 
libraries: libintlc.so.5: cannot open shared object file: No such 
file or directory


How do others solve this problem?

Thanks,
Craig


--
Craig Tierney (craig.tier...@noaa.gov)
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Craig Tierney (craig.tier...@noaa.gov)
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Craig Tierney (craig.tier...@noaa.gov)


Re: [OMPI users] Passing LD_LIBRARY_PATH to orted

2008-10-14 Thread Craig Tierney

Ralph Castain wrote:
You might consider using something like  "module" - we use that system 
for exactly this reason. Works quite well and solves the multiple 
compiler issue.




This is the problem.  We use modules to switch compilers/MPI stacks.
When a job is launched, whatever LD_LIBRARY_PATH that is used in
the current environment is not passed to orted for its use (just to
orted to pass the launching executable).

Craig





Ralph

On Oct 14, 2008, at 12:56 PM, Craig Tierney wrote:


George Bosilca wrote:
The option to expand the remote LD_LIBRARY_PATH, in such a way that 
Open MPI related applications have their dependencies satisfied, is 
in the trunk. The fact that the compiler requires some 
LD_LIBRARY_PATH is out of the scope of an MPI implementation, and I 
don't think we should take care of it.
Passing the local LD_LIBRARY_PATH to the remote nodes doesn't make 
much sense. There are plenty of environment, where the head node have 
a different configuration than the compute nodes. Again, in this case 
my original solution seems not that bad. If you copy (or make a link 
if you prefer) in the Open MPI lib directory to the compiler shared 
libraries, this will work.

 george.


This does work.  It just increases maintenance for each new version
of OpenMPI.   How often does a head node have a different configuration
than the compute node?  It would see that this would even more support 
the

passing of LD_LIBRARY_PATH for OpenMPI tools to support a heterogeneous
configuration as you described.


Thanks,
Craig





On Oct 14, 2008, at 12:11 PM, Craig Tierney wrote:

George Bosilca wrote:

Craig,
This is a problem with the Intel libraries and not the Open MPI 
ones. You have to somehow make these libraries available on the 
compute nodes.
What I usually do (but it's not the best way to solve this problem) 
is to copy these libraries somewhere on my home area and to add the 
directory to my LD_LIBRARY_PATH.

george.


This is ok when you only ever use one compiler, but it isn't very 
flexible.
I want to keep it as simple as possible for my users, while having a 
maintainable

system.

The libraries are on the compute nodes, the problem deals with 
supporting

multiple versions of compilers.  I can't just list all of the lib paths
in ld.so.conf, because then the user will never get the correct 
one.  I can't
specify a static LD_LIBRARY_PATH for the same reason.  I would 
prefer not

to build my system libraries static.

To the OpenMPI developers, what is your opinion on changing 
orterun/mpirun
to pass LD_LIBRARY_PATH to the remote hosts when starting OpenMPI 
processes?

By hand, all that would be done is:

env LD_LIBRARY_PATH=$LD_LIBRARY_PATH $OPMIPATH/orted 

This would ensure that orted is launched correctly.

Or is it better to just build the OpenMPI tools statically?  We also
use other compilers (PGI, Lahey) so I need a solution that works for
all of them.

Thanks,
Craig




On Oct 10, 2008, at 6:17 PM, Craig Tierney wrote:
I am having problems launching openmpi jobs on my system.  I 
support multiple versions
of MPI and compilers using GNU Modules.  For the default compiler, 
everything is fine.

For non-default, I am having problems.

I built Openmpi-1.2.6 (and 1.2.7) with the following configure 
options:


# module load intel/10.1
# ./configure CC=icc CXX=icpc F77=ifort FC=ifort F90=ifort 
--prefix=/opt/openmpi/1.2.7-intel-10.1 --without-
gridengine --enable-io-romio 
--with-io-romio-flags=--with-file-sys=nfs+ufs 
--with-openib=/opt/hjet/ofed/1.3.1


When I launch a job, I run the module command for the right 
compiler/MPI version to set the paths
correctly.  Mpirun passes LD_LIBRARY_PATH to the executable I am 
launching, but not orted.


When orted is launched on the remote system, the LD_LIBRARY_PATH
doesn't come with, and the Intel 10.1 libraries can't be found.

/opt/openmpi/1.2.7-intel-10.1/bin/orted: error while loading 
shared libraries: libintlc.so.5: cannot open shared object file: 
No such file or directory


How do others solve this problem?

Thanks,
Craig


--
Craig Tierney (craig.tier...@noaa.gov)
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Craig Tierney (craig.tier...@noaa.gov)
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Craig Tierney (craig.tier...@noaa.gov)
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users maili

Re: [OMPI users] Passing LD_LIBRARY_PATH to orted

2008-10-14 Thread Craig Tierney

Ralph Castain wrote:
I -think- there is...at least here, it does seem to behave that way on 
our systems. Not sure if there is something done locally to make it work.


Also, though, I have noted that LD_LIBRARY_PATH does seem to be getting 
forwarded on the 1.3 branch in some environments. OMPI isn't doing it 
directly to the best of my knowledge, but I think the base environment 
might be. Specifically, I noticed it on slurm earlier today. I'll check 
the others as far as I can.


Craig: what environment are you using? ssh?
Ralph




We are using ssh (we do not use tight integration in SGE).

Craig






On Oct 14, 2008, at 1:18 PM, George Bosilca wrote:

I use modules too, but they only work locally. Or is there a feature 
in "module" to automatically load the list of currently loaded local 
modules remotely ?


 george.

On Oct 14, 2008, at 3:03 PM, Ralph Castain wrote:

You might consider using something like  "module" - we use that 
system for exactly this reason. Works quite well and solves the 
multiple compiler issue.


Ralph

On Oct 14, 2008, at 12:56 PM, Craig Tierney wrote:


George Bosilca wrote:
The option to expand the remote LD_LIBRARY_PATH, in such a way that 
Open MPI related applications have their dependencies satisfied, is 
in the trunk. The fact that the compiler requires some 
LD_LIBRARY_PATH is out of the scope of an MPI implementation, and I 
don't think we should take care of it.
Passing the local LD_LIBRARY_PATH to the remote nodes doesn't make 
much sense. There are plenty of environment, where the head node 
have a different configuration than the compute nodes. Again, in 
this case my original solution seems not that bad. If you copy (or 
make a link if you prefer) in the Open MPI lib directory to the 
compiler shared libraries, this will work.

george.


This does work.  It just increases maintenance for each new version
of OpenMPI.   How often does a head node have a different configuration
than the compute node?  It would see that this would even more 
support the

passing of LD_LIBRARY_PATH for OpenMPI tools to support a heterogeneous
configuration as you described.


Thanks,
Craig





On Oct 14, 2008, at 12:11 PM, Craig Tierney wrote:

George Bosilca wrote:

Craig,
This is a problem with the Intel libraries and not the Open MPI 
ones. You have to somehow make these libraries available on the 
compute nodes.
What I usually do (but it's not the best way to solve this 
problem) is to copy these libraries somewhere on my home area and 
to add the directory to my LD_LIBRARY_PATH.

george.


This is ok when you only ever use one compiler, but it isn't very 
flexible.
I want to keep it as simple as possible for my users, while having 
a maintainable

system.

The libraries are on the compute nodes, the problem deals with 
supporting
multiple versions of compilers.  I can't just list all of the lib 
paths
in ld.so.conf, because then the user will never get the correct 
one.  I can't
specify a static LD_LIBRARY_PATH for the same reason.  I would 
prefer not

to build my system libraries static.

To the OpenMPI developers, what is your opinion on changing 
orterun/mpirun
to pass LD_LIBRARY_PATH to the remote hosts when starting OpenMPI 
processes?

By hand, all that would be done is:

env LD_LIBRARY_PATH=$LD_LIBRARY_PATH $OPMIPATH/orted 

This would ensure that orted is launched correctly.

Or is it better to just build the OpenMPI tools statically?  We also
use other compilers (PGI, Lahey) so I need a solution that works for
all of them.

Thanks,
Craig




On Oct 10, 2008, at 6:17 PM, Craig Tierney wrote:
I am having problems launching openmpi jobs on my system.  I 
support multiple versions
of MPI and compilers using GNU Modules.  For the default 
compiler, everything is fine.

For non-default, I am having problems.

I built Openmpi-1.2.6 (and 1.2.7) with the following configure 
options:


# module load intel/10.1
# ./configure CC=icc CXX=icpc F77=ifort FC=ifort F90=ifort 
--prefix=/opt/openmpi/1.2.7-intel-10.1 --without-
gridengine --enable-io-romio 
--with-io-romio-flags=--with-file-sys=nfs+ufs 
--with-openib=/opt/hjet/ofed/1.3.1


When I launch a job, I run the module command for the right 
compiler/MPI version to set the paths
correctly.  Mpirun passes LD_LIBRARY_PATH to the executable I am 
launching, but not orted.


When orted is launched on the remote system, the LD_LIBRARY_PATH
doesn't come with, and the Intel 10.1 libraries can't be found.

/opt/openmpi/1.2.7-intel-10.1/bin/orted: error while loading 
shared libraries: libintlc.so.5: cannot open shared object file: 
No such file or directory


How do others solve this problem?

Thanks,
Craig


--
Craig Tierney (craig.tier...@noaa.gov)
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-

Re: [OMPI users] Passing LD_LIBRARY_PATH to orted

2008-10-14 Thread Craig Tierney

Reuti wrote:


Am 14.10.2008 um 23:18 schrieb Craig Tierney:


Ralph Castain wrote:
I -think- there is...at least here, it does seem to behave that way 
on our systems. Not sure if there is something done locally to make 
it work.
Also, though, I have noted that LD_LIBRARY_PATH does seem to be 
getting forwarded on the 1.3 branch in some environments. OMPI isn't 
doing it directly to the best of my knowledge, but I think the base 
environment might be. Specifically, I noticed it on slurm earlier 
today. I'll check the others as far as I can.

Craig: what environment are you using? ssh?
Ralph


We are using ssh (we do not use tight integration in SGE).


Hi Craig, may I ask why? You compiled Open MPI without SGE support, as 
in 1.2.7 it's in by default AFAIK? - Reuti





Only because we don't have it on.  When we first started using
SGE around 2002, we hadn't used it.  It is on our list of things to
do, but it is not trivial to just turn on and validate.   We compiled all 
versions
of OpenMPI we have used (1.2.4,1.2.6, and 1.2.7) with --without-gridengine.

Craig







Craig





On Oct 14, 2008, at 1:18 PM, George Bosilca wrote:
I use modules too, but they only work locally. Or is there a feature 
in "module" to automatically load the list of currently loaded local 
modules remotely ?


 george.

On Oct 14, 2008, at 3:03 PM, Ralph Castain wrote:

You might consider using something like  "module" - we use that 
system for exactly this reason. Works quite well and solves the 
multiple compiler issue.


Ralph

On Oct 14, 2008, at 12:56 PM, Craig Tierney wrote:


George Bosilca wrote:
The option to expand the remote LD_LIBRARY_PATH, in such a way 
that Open MPI related applications have their dependencies 
satisfied, is in the trunk. The fact that the compiler requires 
some LD_LIBRARY_PATH is out of the scope of an MPI 
implementation, and I don't think we should take care of it.
Passing the local LD_LIBRARY_PATH to the remote nodes doesn't 
make much sense. There are plenty of environment, where the head 
node have a different configuration than the compute nodes. 
Again, in this case my original solution seems not that bad. If 
you copy (or make a link if you prefer) in the Open MPI lib 
directory to the compiler shared libraries, this will work.

george.


This does work.  It just increases maintenance for each new version
of OpenMPI.   How often does a head node have a different 
configuration
than the compute node?  It would see that this would even more 
support the
passing of LD_LIBRARY_PATH for OpenMPI tools to support a 
heterogeneous

configuration as you described.


Thanks,
Craig





On Oct 14, 2008, at 12:11 PM, Craig Tierney wrote:

George Bosilca wrote:

Craig,
This is a problem with the Intel libraries and not the Open MPI 
ones. You have to somehow make these libraries available on the 
compute nodes.
What I usually do (but it's not the best way to solve this 
problem) is to copy these libraries somewhere on my home area 
and to add the directory to my LD_LIBRARY_PATH.

george.


This is ok when you only ever use one compiler, but it isn't 
very flexible.
I want to keep it as simple as possible for my users, while 
having a maintainable

system.

The libraries are on the compute nodes, the problem deals with 
supporting
multiple versions of compilers.  I can't just list all of the 
lib paths
in ld.so.conf, because then the user will never get the correct 
one.  I can't
specify a static LD_LIBRARY_PATH for the same reason.  I would 
prefer not

to build my system libraries static.

To the OpenMPI developers, what is your opinion on changing 
orterun/mpirun
to pass LD_LIBRARY_PATH to the remote hosts when starting 
OpenMPI processes?

By hand, all that would be done is:

env LD_LIBRARY_PATH=$LD_LIBRARY_PATH $OPMIPATH/orted 

This would ensure that orted is launched correctly.

Or is it better to just build the OpenMPI tools statically?  We 
also
use other compilers (PGI, Lahey) so I need a solution that works 
for

all of them.

Thanks,
Craig




On Oct 10, 2008, at 6:17 PM, Craig Tierney wrote:
I am having problems launching openmpi jobs on my system.  I 
support multiple versions
of MPI and compilers using GNU Modules.  For the default 
compiler, everything is fine.

For non-default, I am having problems.

I built Openmpi-1.2.6 (and 1.2.7) with the following configure 
options:


# module load intel/10.1
# ./configure CC=icc CXX=icpc F77=ifort FC=ifort F90=ifort 
--prefix=/opt/openmpi/1.2.7-intel-10.1 --without-
gridengine --enable-io-romio 
--with-io-romio-flags=--with-file-sys=nfs+ufs 
--with-openib=/opt/hjet/ofed/1.3.1


When I launch a job, I run the module command for the right 
compiler/MPI version to set the paths
correctly.  Mpirun passes LD_LIBRARY_PATH to the executable I 
am launching, but not orted.


When orted is launched on the remote system, the LD_LIBRARY_PATH
doesn't come with, and the Inte

Re: [OMPI users] Passing LD_LIBRARY_PATH to orted

2008-10-14 Thread Craig Tierney

Reuti wrote:

Am 14.10.2008 um 23:39 schrieb Craig Tierney:


Reuti wrote:

Am 14.10.2008 um 23:18 schrieb Craig Tierney:

Ralph Castain wrote:
I -think- there is...at least here, it does seem to behave that way 
on our systems. Not sure if there is something done locally to make 
it work.
Also, though, I have noted that LD_LIBRARY_PATH does seem to be 
getting forwarded on the 1.3 branch in some environments. OMPI 
isn't doing it directly to the best of my knowledge, but I think 
the base environment might be. Specifically, I noticed it on slurm 
earlier today. I'll check the others as far as I can.

Craig: what environment are you using? ssh?
Ralph


We are using ssh (we do not use tight integration in SGE).
Hi Craig, may I ask why? You compiled Open MPI without SGE support, 
as in 1.2.7 it's in by default AFAIK? - Reuti


Only because we don't have it on.  When we first started using
SGE around 2002, we hadn't used it.  It is on our list of things to


This was still Codine 5.3 - or already SGE?


We started with SGE 5.3.  It was definitely SGE.



do, but it is not trivial to just turn on and validate.   We compiled 
all versions
of OpenMPI we have used (1.2.4,1.2.6, and 1.2.7) with 
--without-gridengine.


It's built-in and you don't need any special start- or stop_proc_args, 
just /bin/true will do. It could even be, that copying the CODINE_* to 
SGE_* might make Open MPI usable with Codine.


If you want to set some things for ssh login, you can put the necessary 
things in ~/.bashrc.




Thanks for the tips, I will try it out when I get a chance (and now
we are very off topic).

Craig




-- Reuti




Craig







Craig





On Oct 14, 2008, at 1:18 PM, George Bosilca wrote:
I use modules too, but they only work locally. Or is there a 
feature in "module" to automatically load the list of currently 
loaded local modules remotely ?


 george.

On Oct 14, 2008, at 3:03 PM, Ralph Castain wrote:

You might consider using something like  "module" - we use that 
system for exactly this reason. Works quite well and solves the 
multiple compiler issue.


Ralph

On Oct 14, 2008, at 12:56 PM, Craig Tierney wrote:


George Bosilca wrote:
The option to expand the remote LD_LIBRARY_PATH, in such a way 
that Open MPI related applications have their dependencies 
satisfied, is in the trunk. The fact that the compiler requires 
some LD_LIBRARY_PATH is out of the scope of an MPI 
implementation, and I don't think we should take care of it.
Passing the local LD_LIBRARY_PATH to the remote nodes doesn't 
make much sense. There are plenty of environment, where the 
head node have a different configuration than the compute 
nodes. Again, in this case my original solution seems not that 
bad. If you copy (or make a link if you prefer) in the Open MPI 
lib directory to the compiler shared libraries, this will work.

george.


This does work.  It just increases maintenance for each new version
of OpenMPI.   How often does a head node have a different 
configuration
than the compute node?  It would see that this would even more 
support the
passing of LD_LIBRARY_PATH for OpenMPI tools to support a 
heterogeneous

configuration as you described.


Thanks,
Craig





On Oct 14, 2008, at 12:11 PM, Craig Tierney wrote:

George Bosilca wrote:

Craig,
This is a problem with the Intel libraries and not the Open 
MPI ones. You have to somehow make these libraries available 
on the compute nodes.
What I usually do (but it's not the best way to solve this 
problem) is to copy these libraries somewhere on my home area 
and to add the directory to my LD_LIBRARY_PATH.

george.


This is ok when you only ever use one compiler, but it isn't 
very flexible.
I want to keep it as simple as possible for my users, while 
having a maintainable

system.

The libraries are on the compute nodes, the problem deals with 
supporting
multiple versions of compilers.  I can't just list all of the 
lib paths
in ld.so.conf, because then the user will never get the 
correct one.  I can't
specify a static LD_LIBRARY_PATH for the same reason.  I would 
prefer not

to build my system libraries static.

To the OpenMPI developers, what is your opinion on changing 
orterun/mpirun
to pass LD_LIBRARY_PATH to the remote hosts when starting 
OpenMPI processes?

By hand, all that would be done is:

env LD_LIBRARY_PATH=$LD_LIBRARY_PATH $OPMIPATH/orted 

This would ensure that orted is launched correctly.

Or is it better to just build the OpenMPI tools statically?  
We also
use other compilers (PGI, Lahey) so I need a solution that 
works for

all of them.

Thanks,
Craig




On Oct 10, 2008, at 6:17 PM, Craig Tierney wrote:
I am having problems launching openmpi jobs on my system.  I 
support multiple versions
of MPI and compilers using GNU Modules.  For the default 
compiler, everything is fine.

For non-default, I am having problems.

I built Openmpi-1.2.6 (and 1.2.7) with the f

[OMPI users] Problem launching jobs in SGE (with loose integration), OpenMPI 1.3.3

2009-07-23 Thread Craig Tierney

I have built OpenMPI 1.3.3 without support for SGE.
I just want to launch jobs with loose integration right
now.

Here is how I configured it:

./configure CC=pgcc CXX=pgCC F77=pgf90 F90=pgf90 FC=pgf90 
--prefix=/opt/openmpi/1.3.3-pgi --without-sge
 --enable-io-romio --with-openib=/opt/hjet/ofed/1.4.1 
--with-io-romio-flags=--with-file-system=lustre 
--enable-orterun-prefix-by-default


I can start jobs from the commandline just fine.  When
I try to do the same thing inside an SGE job, I get
errors like the following:


error: executing task of job 5041155 failed:
--
A daemon (pid 13324) died unexpectedly with status 1 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--
--
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--
mpirun: clean termination accomplished


I am starting mpirun with the following options:

$OMPI/bin/mpirun -mca btl openib,sm,self --mca pls ^sge \
-machinefile $MACHINE_FILE -x LD_LIBRARY_PATH -np 16 ./xhpl

The options are to ensure I am using IB, that SGE is not used, and that
the LD_LIBRARY_PATH is sent along to ensure dynamic linking is done 
correctly.


This worked with 1.2.7 (except setting the pls option as gridengine 
instead of sge), but I can't get it to work with 1.3.3.


Am I missing something obvious for getting jobs with loose integration
started?

Thanks,
Craig



Re: [OMPI users] Problem launching jobs in SGE (with loose integration), OpenMPI 1.3.3

2009-07-23 Thread Craig Tierney
Rolf Vandevaart wrote:
> I think what you are looking for is this:
> 
> --mca plm_rsh_disable_qrsh 1
> 
> This means we will disable the use of qrsh and use rsh or ssh instead.
> 
> The --mca pls ^sge does not work anymore for two reasons.  First, the
> "pls" framework was renamed "plm".  Secondly, the gridgengine plm was
> folded into the rsh/ssh one.
> 

Rolf,

Thanks for the quick reply.  That solved the problem.

Craig


> A few more details at
> http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge
> 
> Rolf
> 
> On 07/23/09 10:34, Craig Tierney wrote:
>> I have built OpenMPI 1.3.3 without support for SGE.
>> I just want to launch jobs with loose integration right
>> now.
>>
>> Here is how I configured it:
>>
>> ./configure CC=pgcc CXX=pgCC F77=pgf90 F90=pgf90 FC=pgf90
>> --prefix=/opt/openmpi/1.3.3-pgi --without-sge
>>  --enable-io-romio --with-openib=/opt/hjet/ofed/1.4.1
>> --with-io-romio-flags=--with-file-system=lustre
>> --enable-orterun-prefix-by-default
>>
>> I can start jobs from the commandline just fine.  When
>> I try to do the same thing inside an SGE job, I get
>> errors like the following:
>>
>>
>> error: executing task of job 5041155 failed:
>> --
>>
>> A daemon (pid 13324) died unexpectedly with status 1 while attempting
>> to launch so we are aborting.
>>
>> There may be more information reported by the environment (see above).
>>
>> This may be because the daemon was unable to find all the needed shared
>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have
>> the
>> location of the shared libraries on the remote nodes and this will
>> automatically be forwarded to the remote nodes.
>> --
>>
>> --
>>
>> mpirun noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> --
>>
>> mpirun: clean termination accomplished
>>
>>
>> I am starting mpirun with the following options:
>>
>> $OMPI/bin/mpirun -mca btl openib,sm,self --mca pls ^sge \
>> -machinefile $MACHINE_FILE -x LD_LIBRARY_PATH -np 16 ./xhpl
>>
>> The options are to ensure I am using IB, that SGE is not used, and that
>> the LD_LIBRARY_PATH is sent along to ensure dynamic linking is done
>> correctly.
>>
>> This worked with 1.2.7 (except setting the pls option as gridengine
>> instead of sge), but I can't get it to work with 1.3.3.
>>
>> Am I missing something obvious for getting jobs with loose integration
>> started?
>>
>> Thanks,
>> Craig
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 


-- 
Craig Tierney (craig.tier...@noaa.gov)


[OMPI users] Performance question about OpenMPI and MVAPICH2 on IB

2009-08-06 Thread Craig Tierney
I am running openmpi-1.3.3 on my cluster which is using
OFED-1.4.1 for Infiniband support.  I am comparing performance
between this version of OpenMPI and Mvapich2, and seeing a
very large difference in performance.

The code I am testing is WRF v3.0.1.  I am running the
12km benchmark.

The two builds are the exact same codes and configuration
files.  All I did different was use modules to switch versions
of MPI, and recompiled the code.

Performance:

Cores   Mvapich2Openmpi
---
   8  17.313.9
  16  31.725.9
  32  62.951.6
  64 110.892.8
 128 219.2   189.4
 256 384.5   317.8
 512 687.2   516.7

The performance number is GFlops (so larger is better).

I am calling openmpi as:

/opt/openmpi/1.3.3-intel/bin/mpirun  --mca plm_rsh_disable_qrsh 1 --mca btl 
openib,sm,self \
-machinefile /tmp/6026489.1.qntest.q/machines -x LD_LIBRARY_PATH -np $NSLOTS 
/home/ctierney/bin/noaa_affinity ./wrf.exe

So,

Is this expected?  Are some common sense optimizations to use?
Is there a way to verify that I am really using the IB?  When
I try:

-mca bta ^tcp,openib,sm,self

I get the errors:
--
No available btl components were found!

This means that there are no components of this type installed on your
system or all the components reported that they could not be used.

This is a fatal error; your MPI process is likely to abort.  Check the
output of the "ompi_info" command and ensure that components of this
type are available on your system.  You may also wish to check the
value of the "component_path" MCA parameter and ensure that it has at
least one directory that contains valid MCA components.
--

But ompi_info is telling me that I have openib support:

   MCA btl: openib (MCA v2.0, API v2.0, Component v1.3.3)

Note, I did rebuild OFED and put it in a different directory
and did not rebuild OpenMPI.  However, since ompi_info isn't
complaining and the libraries are available, I am thinking that
is isn't a problem.  I could be wrong.

Thanks,
Craig
-- 
Craig Tierney (craig.tier...@noaa.gov)



Re: [OMPI users] Performance question about OpenMPI and MVAPICH2 on IB

2009-08-06 Thread Craig Tierney
A followup

Part of problem was affinity.  I had written a script to do processor
and memory affinity (which works fine with MVAPICH2).  It is an
idea that I got from TACC.  However, the script didn't seem to
work correctly with OpenMPI (or I still have bugs).

Setting --mca mpi_paffinity_alone 1 made things better.  However,
the performance is still not as good:

Cores   Mvapich2Openmpi
---
   8  17.317.3
  16  31.731.5
  32  62.962.8
  64 110.8   108.0
 128 219.2   201.4
 256 384.5   342.7
 512 687.2   537.6

The performance number is GFlops (so larger is better).

The first few numbers show that the executable is the right
speed.  I verified that IB is being used by using OMB and
checking latency and bandwidth.  Those numbers are what I
expect (3GB/s, 1.5mu/s for QDR).

However, the Openmpi version is not scaling as well.  Any
ideas on why that might be the case?

Thanks,
Craig


Craig Tierney wrote:
> I am running openmpi-1.3.3 on my cluster which is using
> OFED-1.4.1 for Infiniband support.  I am comparing performance
> between this version of OpenMPI and Mvapich2, and seeing a
> very large difference in performance.
> 
> The code I am testing is WRF v3.0.1.  I am running the
> 12km benchmark.
> 
> The two builds are the exact same codes and configuration
> files.  All I did different was use modules to switch versions
> of MPI, and recompiled the code.
> 
> Performance:
> 
> Cores   Mvapich2Openmpi
> ---
>8  17.313.9
>   16  31.725.9
>   32  62.951.6
>   64 110.892.8
>  128 219.2   189.4
>  256 384.5   317.8
>  512 687.2   516.7
> 
> The performance number is GFlops (so larger is better).
> 
> I am calling openmpi as:
> 
> /opt/openmpi/1.3.3-intel/bin/mpirun  --mca plm_rsh_disable_qrsh 1 --mca btl 
> openib,sm,self \
> -machinefile /tmp/6026489.1.qntest.q/machines -x LD_LIBRARY_PATH -np $NSLOTS 
> /home/ctierney/bin/noaa_affinity ./wrf.exe
> 
> So,
> 
> Is this expected?  Are some common sense optimizations to use?
> Is there a way to verify that I am really using the IB?  When
> I try:
> 
> -mca bta ^tcp,openib,sm,self
> 
> I get the errors:
> --
> No available btl components were found!
> 
> This means that there are no components of this type installed on your
> system or all the components reported that they could not be used.
> 
> This is a fatal error; your MPI process is likely to abort.  Check the
> output of the "ompi_info" command and ensure that components of this
> type are available on your system.  You may also wish to check the
> value of the "component_path" MCA parameter and ensure that it has at
> least one directory that contains valid MCA components.
> --
> 
> But ompi_info is telling me that I have openib support:
> 
>MCA btl: openib (MCA v2.0, API v2.0, Component v1.3.3)
> 
> Note, I did rebuild OFED and put it in a different directory
> and did not rebuild OpenMPI.  However, since ompi_info isn't
> complaining and the libraries are available, I am thinking that
> is isn't a problem.  I could be wrong.
> 
> Thanks,
> Craig


-- 
Craig Tierney (craig.tier...@noaa.gov)


Re: [OMPI users] Performance question about OpenMPI and MVAPICH2 on IB

2009-08-06 Thread Craig Tierney
Gus Correa wrote:
> Hi Craig, list
> 
> I suppose WRF uses MPI collective calls (MPI_Reduce,
> MPI_Bcast, MPI_Alltoall etc),
> just like the climate models we run here do.
> A recursive grep on the source code will tell.
> 

I will check this out.  I am not the WRF expert, but
I was under the impression that most weather models are
nearest neighbor communications, not collectives.


> If that is the case, you may need to tune the collectives dynamically.
> We are experimenting with tuned collectives here also.
> 
> Specifically, we had a scaling problem with the MITgcm
> (also running on an IB cluster)
> that is probably due to collectives.
> Similar problems were reported on this list before,
> with computational chemistry software.
> See these threads:
> http://www.open-mpi.org/community/lists/users/2009/07/10045.php
> http://www.open-mpi.org/community/lists/users/2009/05/9419.php
> 
> If WRF outputs timing information, particularly the time spent on MPI
> routines, you may also want to compare how the OpenMPI and
> MVAPICH versions fare w.r.t. MPI collectives.
> 
> I hope this helps.
> 

I will look into this.  Thanks for the ideas.

Craig



> Gus Correa
> -
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> -
> 
> 
> 
> Craig Tierney wrote:
>> I am running openmpi-1.3.3 on my cluster which is using
>> OFED-1.4.1 for Infiniband support.  I am comparing performance
>> between this version of OpenMPI and Mvapich2, and seeing a
>> very large difference in performance.
>>
>> The code I am testing is WRF v3.0.1.  I am running the
>> 12km benchmark.
>>
>> The two builds are the exact same codes and configuration
>> files.  All I did different was use modules to switch versions
>> of MPI, and recompiled the code.
>>
>> Performance:
>>
>> Cores   Mvapich2Openmpi
>> ---
>>8  17.313.9
>>   16  31.725.9
>>   32  62.951.6
>>   64 110.892.8
>>  128 219.2   189.4
>>  256 384.5   317.8
>>  512 687.2   516.7
>>
>> The performance number is GFlops (so larger is better).
>>
>> I am calling openmpi as:
>>
>> /opt/openmpi/1.3.3-intel/bin/mpirun  --mca plm_rsh_disable_qrsh 1
>> --mca btl openib,sm,self \
>> -machinefile /tmp/6026489.1.qntest.q/machines -x LD_LIBRARY_PATH -np
>> $NSLOTS /home/ctierney/bin/noaa_affinity ./wrf.exe
>>
>> So,
>>
>> Is this expected?  Are some common sense optimizations to use?
>> Is there a way to verify that I am really using the IB?  When
>> I try:
>>
>> -mca bta ^tcp,openib,sm,self
>>
>> I get the errors:
>> --
>>
>> No available btl components were found!
>>
>> This means that there are no components of this type installed on your
>> system or all the components reported that they could not be used.
>>
>> This is a fatal error; your MPI process is likely to abort.  Check the
>> output of the "ompi_info" command and ensure that components of this
>> type are available on your system.  You may also wish to check the
>> value of the "component_path" MCA parameter and ensure that it has at
>> least one directory that contains valid MCA components.
>> --
>>
>>
>> But ompi_info is telling me that I have openib support:
>>
>>MCA btl: openib (MCA v2.0, API v2.0, Component v1.3.3)
>>
>> Note, I did rebuild OFED and put it in a different directory
>> and did not rebuild OpenMPI.  However, since ompi_info isn't
>> complaining and the libraries are available, I am thinking that
>> is isn't a problem.  I could be wrong.
>>
>> Thanks,
>> Craig
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


-- 
Craig Tierney (craig.tier...@noaa.gov)


Re: [OMPI users] Performance question about OpenMPI and MVAPICH2 on IB

2009-08-07 Thread Craig Tierney

Terry Dontje wrote:

Craig,

Did your affinity script bind the processes per socket or linearly to 
cores.  If the former you'll want to look at using rankfiles and place 
the ranks based on sockets.  TWe have found this especially useful if 
you are not running fully subscribed on your machines.




The script binds them to sockets and also binds memory per node.
It is smart enough that if the machine_file does not use all
the cores (because the user reordered them) then the script will
lay out the tasks evenly between the two sockets.

Also, if you think the main issue is collectives performance you may 
want to try using the hierarchical and SM collectives.  However, be 
forewarned we are right now trying to pound out some errors with these 
modules.  To enable them you add the following parameters "--mca 
coll_hierarch_priority 100 --mca coll_sm_priority 100".  We would be 
very interested in any results you get (failures, improvements, 
non-improvements).




I don't know what it is slow.  OpenMPI is so flexible in how the
stack can be tuned.  But I also have 100s of users runing dozens
of major codes, and what I need is a set of options that 'just work'
in most cases.

I will try the above options and get back to you.

Craig





thanks,

--td


Message: 4
Date: Thu, 06 Aug 2009 17:03:08 -0600
From: Craig Tierney 
Subject: Re: [OMPI users] Performance question about OpenMPI and
MVAPICH2 onIB
To: Open MPI Users 
Message-ID: <4a7b612c.8070...@noaa.gov>
Content-Type: text/plain; charset=ISO-8859-1

A followup

Part of problem was affinity.  I had written a script to do processor
and memory affinity (which works fine with MVAPICH2).  It is an
idea that I got from TACC.  However, the script didn't seem to
work correctly with OpenMPI (or I still have bugs).

Setting --mca mpi_paffinity_alone 1 made things better.  However,
the performance is still not as good:

Cores   Mvapich2Openmpi
---
   8  17.317.3
  16  31.731.5
  32  62.962.8
  64 110.8   108.0
 128 219.2   201.4
 256 384.5   342.7
 512 687.2   537.6

The performance number is GFlops (so larger is better).

The first few numbers show that the executable is the right
speed.  I verified that IB is being used by using OMB and
checking latency and bandwidth.  Those numbers are what I
expect (3GB/s, 1.5mu/s for QDR).

However, the Openmpi version is not scaling as well.  Any
ideas on why that might be the case?

Thanks,
Craig


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] Performance question about OpenMPI and MVAPICH2 on IB

2009-08-07 Thread Craig Tierney

nee...@crlindia.com wrote:

Hi Craig,

How was the nodefile selected for execution? Whether it was 
provided by scheduler say LSF/SGE/PBS or you manually gave it?
With WRF, we observed giving sequential nodes (Blades which are in the 
same order as in enclosure) gave us some performance benefit.


Regards



I figured this might be the case.  Right now the batch system
is giving the nodes to the applciation.  They are not sorted,
and I have considered doing that.  I have also launched numerous
cases of one problems size, and I don't get that much variation
in run time, not to explain the differences in MPI stack.

Craig





Neeraj Chourasia (MTS)
Computational Research Laboratories Ltd.
(A wholly Owned Subsidiary of TATA SONS Ltd)
B-101, ICC Trade Towers, Senapati Bapat Road
Pune 411016 (Mah) INDIA
(O) +91-20-6620 9863  (Fax) +91-20-6620 9862
M: +91.9225520634



*Craig Tierney *
Sent by: users-boun...@open-mpi.org

08/07/2009 04:43 AM
Please respond to
Open MPI Users 



To
Open MPI Users 
cc

Subject
	Re: [OMPI users] Performance question about OpenMPI and MVAPICH2 on 
   IB









Gus Correa wrote:
 > Hi Craig, list
 >
 > I suppose WRF uses MPI collective calls (MPI_Reduce,
 > MPI_Bcast, MPI_Alltoall etc),
 > just like the climate models we run here do.
 > A recursive grep on the source code will tell.
 >

I will check this out.  I am not the WRF expert, but
I was under the impression that most weather models are
nearest neighbor communications, not collectives.


 > If that is the case, you may need to tune the collectives dynamically.
 > We are experimenting with tuned collectives here also.
 >
 > Specifically, we had a scaling problem with the MITgcm
 > (also running on an IB cluster)
 > that is probably due to collectives.
 > Similar problems were reported on this list before,
 > with computational chemistry software.
 > See these threads:
 > http://www.open-mpi.org/community/lists/users/2009/07/10045.php
 > http://www.open-mpi.org/community/lists/users/2009/05/9419.php
 >
 > If WRF outputs timing information, particularly the time spent on MPI
 > routines, you may also want to compare how the OpenMPI and
 > MVAPICH versions fare w.r.t. MPI collectives.
 >
 > I hope this helps.
 >

I will look into this.  Thanks for the ideas.

Craig



 > Gus Correa
 > -
 > Gustavo Correa
 > Lamont-Doherty Earth Observatory - Columbia University
 > Palisades, NY, 10964-8000 - USA
 > -
 >
 >
 >
 > Craig Tierney wrote:
 >> I am running openmpi-1.3.3 on my cluster which is using
 >> OFED-1.4.1 for Infiniband support.  I am comparing performance
 >> between this version of OpenMPI and Mvapich2, and seeing a
 >> very large difference in performance.
 >>
 >> The code I am testing is WRF v3.0.1.  I am running the
 >> 12km benchmark.
 >>
 >> The two builds are the exact same codes and configuration
 >> files.  All I did different was use modules to switch versions
 >> of MPI, and recompiled the code.
 >>
 >> Performance:
 >>
 >> Cores   Mvapich2Openmpi
 >> ---
 >>8  17.313.9
 >>   16  31.725.9
 >>   32  62.951.6
 >>   64 110.892.8
 >>  128 219.2   189.4
 >>  256 384.5   317.8
 >>  512 687.2   516.7
 >>
 >> The performance number is GFlops (so larger is better).
 >>
 >> I am calling openmpi as:
 >>
 >> /opt/openmpi/1.3.3-intel/bin/mpirun  --mca plm_rsh_disable_qrsh 1
 >> --mca btl openib,sm,self \
 >> -machinefile /tmp/6026489.1.qntest.q/machines -x LD_LIBRARY_PATH -np
 >> $NSLOTS /home/ctierney/bin/noaa_affinity ./wrf.exe
 >>
 >> So,
 >>
 >> Is this expected?  Are some common sense optimizations to use?
 >> Is there a way to verify that I am really using the IB?  When
 >> I try:
 >>
 >> -mca bta ^tcp,openib,sm,self
 >>
 >> I get the errors:
 >> 
--

 >>
 >> No available btl components were found!
 >>
 >> This means that there are no components of this type installed on your
 >> system or all the components reported that they could not be used.
 >>
 >> This is a fatal error; your MPI process is likely to abort.  Check the
 >> output of the "ompi_info" command and ensure that components of this
 >> type are available on your system.  You may also wish to check the
 >> value of the "component_path" MCA parameter and ensure that it h

Re: [OMPI users] Performance question about OpenMPI and MVAPICH2 on IB

2009-08-07 Thread Craig Tierney

Terry Dontje wrote:

Craig,

Did your affinity script bind the processes per socket or linearly to 
cores.  If the former you'll want to look at using rankfiles and place 
the ranks based on sockets.  TWe have found this especially useful if 
you are not running fully subscribed on your machines.


Also, if you think the main issue is collectives performance you may 
want to try using the hierarchical and SM collectives.  However, be 
forewarned we are right now trying to pound out some errors with these 
modules.  To enable them you add the following parameters "--mca 
coll_hierarch_priority 100 --mca coll_sm_priority 100".  We would be 
very interested in any results you get (failures, improvements, 
non-improvements).




Adding these two options causes the code to segfault at startup.

Craig





thanks,

--td


Message: 4
Date: Thu, 06 Aug 2009 17:03:08 -0600
From: Craig Tierney 
Subject: Re: [OMPI users] Performance question about OpenMPI and
MVAPICH2 onIB
To: Open MPI Users 
Message-ID: <4a7b612c.8070...@noaa.gov>
Content-Type: text/plain; charset=ISO-8859-1

A followup

Part of problem was affinity.  I had written a script to do processor
and memory affinity (which works fine with MVAPICH2).  It is an
idea that I got from TACC.  However, the script didn't seem to
work correctly with OpenMPI (or I still have bugs).

Setting --mca mpi_paffinity_alone 1 made things better.  However,
the performance is still not as good:

Cores   Mvapich2Openmpi
---
   8  17.317.3
  16  31.731.5
  32  62.962.8
  64 110.8   108.0
 128 219.2   201.4
 256 384.5   342.7
 512 687.2   537.6

The performance number is GFlops (so larger is better).

The first few numbers show that the executable is the right
speed.  I verified that IB is being used by using OMB and
checking latency and bandwidth.  Those numbers are what I
expect (3GB/s, 1.5mu/s for QDR).

However, the Openmpi version is not scaling as well.  Any
ideas on why that might be the case?

Thanks,
Craig


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





[OMPI users] Failure trying to use tuned collectives

2009-08-07 Thread Craig Tierney

To use tuned collectives, do all I have to do is add --mca coll tuned?

I am trying to run with:

# mpirun -np 8 --mca coll tuned --mca orte_base_help_aggregate 0 ./wrf.exe

But all the processes fail with the folling message:

--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  mca_coll_base_comm_select(MPI_COMM_WORLD) failed
  --> Returned "Not found" (-13) instead of "Success" (0)
--

Thanks,
Craig



[OMPI users] Unable to run WRF on large core counts (1024+), queue pair error

2009-12-17 Thread Craig Tierney

I am trying to run WRF on 1024 cores with OpenMPI 1.3.3 and
1.4.  I can get the code to run with 512 cores, but it crashes
at startup on 1024 cores.  I am getting the following error message:

[n172][[43536,1],0][connect/btl_openib_connect_oob.c:463:qp_create_one] error 
creating qp errno says Cannot allocate memory
[n172][[43536,1],0][connect/btl_openib_connect_oob.c:809:rml_recv_cb] error in 
endpoint reply start connect

From google, I have tried to change the settings for btl_openib_receive_queues,
but my tries have not worked.  Here was my latest try to reduce the
total queue pairs.

mpirun -np 1024 \
   -mca btl_openib_receive_queues P,128,2048,128,128:S,65536,256,192,128 \
  `wrf.exe

These settings did not help.

Am I looking in the right place?

System setup:
Centos-5.3
Ofed-1.4.1
Intel Compiler 11.1.038
Openmpi-1.3.3 and 1.4

Build options:

./configure CC=icc CXX=icpc F77=ifort F90=ifort FC=ifort --prefix=/opt/openmpi/1.3.3-intel --without-sge --with-openib --enable-io-romio 
--with-io-romio-flags=--with-file-system=lustre --with-pic


Thanks,
Craig


Re: [OMPI users] openMPI on Xgrid

2010-03-30 Thread Craig Tierney
Jody Klymak wrote:
> 
> On Mar 30, 2010, at  11:12 AM, Cristobal Navarro wrote:
> 
>> i just have some questions,
>> Torque requires moab, but from what i've read on the site you have to
>> buy moab right?
> 
> I am pretty sure you can download torque w/o moab.  I do not use moab,
> which I think is a higher-level scheduling layer on top of pbs. 
> However, there are folks here who would know far more than I do about
> these sorts of things.
> 
> Cheers,  Jody
> 

Moab is a scheduler, which works with Torque and several other
products.  Torque comes with a basic scheduler, and Moab is not
required.  If you want more features but not pay for Moab, you
can look at Maui.

Craig




> -- 
> Jody Klymak
> http://web.uvic.ca/~jklymak/
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users