[OMPI users] compiling mpptest using OpenMPI

2007-02-15 Thread Eric Thibodeau
Hello all,

I have been attempting to compile mpptest on my nodes in vain. Here is 
my current setup:

Openmpi is in "$HOME/openmpi_`uname -m`" which translates to 
"/export/home/eric/openmpi_i686/". I tried the following approaches (you can 
see some of these were out of desperation):

CFLAGS=`mpicc --showme:compile` LDFLAGS=`mpicc --showme:link` ./configure

Configure fails on:
checking whether the C compiler works... configure: error: cannot run C 
compiled programs.

The log shows that:
./a.out: error while loading shared libraries: liborte.so.0: cannot open shared 
object file: No such file or directory


CC="/export/home/eric/openmpi_i686/bin/mpicc" ./configure 
--with-mpi=$HOME/openmpi_`uname -m`
Same problems as above...

LDFLAGS="$HOME/openmpi_`uname -m`/lib" ./configure 
--with-mpi=$HOME/openmpi_`uname -m`

Configure fails on:
checking for C compiler default output file name... configure: error: C 
compiler cannot create executables

And...finally (not that all of this was done in the presented order):
./configure --with-mpi=$HOME/openmpi_`uname -m`

Which ends with:

checking for library containing MPI_Init... no
configure: error: Could not find MPI library

Anyone can help me with this one...?

Note that LAM-MPI is also installed on these systems...

Eric Thibodeau


[OMPI users] ORTE errors on simple fortran program with 1.2b3

2007-02-15 Thread Steven A. DuChene
I am trying to do some simple fortran MPI examples to verify I have a good 
installation
of OpenMPI and I have a distributed program that calculates PI. It seems to 
compile
and work fine with 1.1.4 but whan I compile and run the same program with 1.2b3
I get a bunch of the same ORTE errors and then my shell is locked up:

[node001:30268] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space in 
file dss/dss_unpack.c at line 90
[node001:30268] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space in 
file gpr_replica_cmd_processor.c at line 361

I then do a Ctrl-C and it tells me "mpirun: killing job..." but my shell never 
comes back.


I do get the following compile time warnings when I build the simple app with 
either 1.1.4 or 1.2b3:

mpif90  -c -I/opt/openmpi/1.1.4/includepi.f
In file pi.f:73

  call mpi_reduce(times(1), total, 1, mpi_real,
  1
In file pi.f:67

  call mpi_reduce(piece, pi, 1, mpi_double_precision,
  2
Warning (155): Inconsistent types (REAL(4)/REAL(8)) in actual argument lists at 
(1) and (2)
mpif90 -o pi pi.o f.o -L /opt/openmpi/1.1.4/lib -lmpi




Re: [OMPI users] where do the openmpi profile.d scripts get created?

2007-02-15 Thread Jeff Squyres

On Feb 13, 2007, at 4:29 PM, Steven A. DuChene wrote:

I discovered the hard way that there are openmpi profile.d scripts  
that get
packaged into openmpi rpm files. The reason this became a painful  
issue
for our cluster is that it seems the csh profile.d script that gets  
installed
with the openmpi-runtime-1.1.4 is defective. If it gets sourced  
into a user's
environment it makes tcsh on linux error out with a "if: Badly  
formed number"


Yoinks.  Well, it goes to show how many people used that SRPM.  :-)

Sorry about those -- I have most of those fixed on the trunk but  
forgot to back-port most of the fixes back to the 1.1 branch SRPM  
specfile.


I want to be able to alter the spec file that builds the rpm so I  
can have it
automagically incorperate the patch we worked up to fix this issue  
but I have
not been able to figure out where in the openmpi sources that the  
profile.d

scripts for csh and sh get generated.


They're actually generated in the specfile itself.


We had to patch the openmpi-1.1.4.csh script as follows:

-if ("") then
-setenv PATH ${PATH}:/opt/openmpi-g95/1.1.4/bin/
+if ( $?PATH ) then
+setenv PATH ${PATH}:/opt/openmpi/1.1.4/bin/
 endif
-if ("1LD_LIBRARY_PATH") then
-if ("") then
-setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/opt/openmpi- 
g95/1.1.4/lib

-endif
+if ( $?LD_LIBRARY_PATH ) then
+setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:/opt/openmpi/ 
1.1.4/lib

 endif
-if ("1MANPATH") then
-if ("") then
-setenv MANPATH ${MANPATH}:/opt/openmpi-g95/1.1.4/man
-endif
+if ( $?MANPATH ) then
+setenv MANPATH ${MANPATH}:/opt/openmpi/1.1.4/man
 endif


Most of this is due to bad escaping (i.e., lackthereof) in the spec  
file.  I'll fix that up shortly.  We're likely to do a 1.1.5 release  
in the not-distant future -- is it ok to wait for that, or do you  
need a new 1.1.4 SRPM?


Thanks for bringing it to my attention!

--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems



Re: [OMPI users] ORTE errors on simple fortran program with 1.2b3

2007-02-15 Thread Brian Barrett

What platform / operating system was this with?

Brian

On Feb 15, 2007, at 3:43 PM, Steven A. DuChene wrote:

I am trying to do some simple fortran MPI examples to verify I have  
a good installation
of OpenMPI and I have a distributed program that calculates PI. It  
seems to compile
and work fine with 1.1.4 but whan I compile and run the same  
program with 1.2b3

I get a bunch of the same ORTE errors and then my shell is locked up:

[node001:30268] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate  
space in file dss/dss_unpack.c at line 90
[node001:30268] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate  
space in file gpr_replica_cmd_processor.c at line 361


I then do a Ctrl-C and it tells me "mpirun: killing job..." but my  
shell never comes back.



I do get the following compile time warnings when I build the  
simple app with either 1.1.4 or 1.2b3:


mpif90  -c -I/opt/openmpi/1.1.4/includepi.f
In file pi.f:73

  call mpi_reduce(times(1), total, 1, mpi_real,
  1
In file pi.f:67

  call mpi_reduce(piece, pi, 1, mpi_double_precision,
  2
Warning (155): Inconsistent types (REAL(4)/REAL(8)) in actual  
argument lists at (1) and (2)

mpif90 -o pi pi.o f.o -L /opt/openmpi/1.1.4/lib -lmpi


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] ORTE errors on simple fortran program with 1.2b3

2007-02-15 Thread Jeff Squyres

On Feb 15, 2007, at 5:43 PM, Steven A. DuChene wrote:

I am trying to do some simple fortran MPI examples to verify I have  
a good installation
of OpenMPI and I have a distributed program that calculates PI. It  
seems to compile
and work fine with 1.1.4 but whan I compile and run the same  
program with 1.2b3

I get a bunch of the same ORTE errors and then my shell is locked up:

[node001:30268] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate  
space in file dss/dss_unpack.c at line 90
[node001:30268] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate  
space in file gpr_replica_cmd_processor.c at line 361


I then do a Ctrl-C and it tells me "mpirun: killing job..." but my  
shell never comes back.


We had some problems with this in 1.2b3.  I honestly don't remember  
if we fixed them by 1.2b4 or not -- could you try a recent 1.2  
nightly snapshot?  they should be fixed there:


http://www.open-mpi.org/nightly/v1.2/

I do get the following compile time warnings when I build the  
simple app with either 1.1.4 or 1.2b3:


mpif90  -c -I/opt/openmpi/1.1.4/includepi.f
In file pi.f:73

  call mpi_reduce(times(1), total, 1, mpi_real,
  1
In file pi.f:67

  call mpi_reduce(piece, pi, 1, mpi_double_precision,
  2
Warning (155): Inconsistent types (REAL(4)/REAL(8)) in actual  
argument lists at (1) and (2)

mpif90 -o pi pi.o f.o -L /opt/openmpi/1.1.4/lib -lmpi


I'm not a Fortran expert, but I think that this is the f90 compiling  
telling you that you have inconsistent types for the first argument  
of MPI_REDUCE.  This is mainly because there is no equivalent in  
Fortran to C's (void*) type -- it's the compiler trying to be helpful  
saying, "Hey, I noticed you have inconsistent types in successive  
calls to the same function.  Did you really mean to do that?"


For MPI apps using choice buffers (like the first argument in  
MPI_REDUCE), yes, you did mean to do that -- it's ok.  This is not  
really an OMPI issue, but rather a Fortran compiler issue.  What you  
might try is:


- use mpif77 instead (although, depending on your compiler, the  
result may be exactly the same)
- poke through your fortran compiler's docs and see if there's a flag  
that disables this warning


Hope that helps.

--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems



Re: [OMPI users] compiling mpptest using OpenMPI

2007-02-15 Thread Jeff Squyres
I think you want to add $HOME/openmpi_`uname -m`/lib to your  
LD_LIBRARY_PATH.  This should allow executables created by mpicc (or  
any derivation thereof, such as extracting flags via showme) to find  
the Right shared libraries.


Let us know if that works for you.

FWIW, we do recommend using the wrapper compilers over extracting the  
flags via --showme whenever possible (it's just simpler and should do  
what you need).



On Feb 15, 2007, at 3:38 PM, Eric Thibodeau wrote:


Hello all,


I have been attempting to compile mpptest on my nodes in vain. Here  
is my current setup:



Openmpi is in "$HOME/openmpi_`uname -m`" which translates to "/ 
export/home/eric/openmpi_i686/". I tried the following approaches  
(you can see some of these were out of desperation):



CFLAGS=`mpicc --showme:compile` LDFLAGS=`mpicc --showme:link` ./ 
configure



Configure fails on:

checking whether the C compiler works... configure: error: cannot  
run C compiled programs.



The log shows that:

./a.out: error while loading shared libraries: liborte.so.0: cannot  
open shared object file: No such file or directory




CC="/export/home/eric/openmpi_i686/bin/mpicc" ./configure --with- 
mpi=$HOME/openmpi_`uname -m`


Same problems as above...


LDFLAGS="$HOME/openmpi_`uname -m`/lib" ./configure --with-mpi=$HOME/ 
openmpi_`uname -m`



Configure fails on:

checking for C compiler default output file name... configure:  
error: C compiler cannot create executables



And...finally (not that all of this was done in the presented order):

./configure --with-mpi=$HOME/openmpi_`uname -m`


Which ends with:


checking for library containing MPI_Init... no

configure: error: Could not find MPI library


Anyone can help me with this one...?


Note that LAM-MPI is also installed on these systems...


Eric Thibodeau


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems



Re: [OMPI users] compiling mpptest using OpenMPI

2007-02-15 Thread Eric Thibodeau
Hi Jeff,

Thanks for your response, I eventually figured it out, here is the 
only way I got mpptest to compile:

export LD_LIBRARY_PATH="$HOME/openmpi_`uname -m`/lib"
CC="$HOME/openmpi_`uname -m`/bin/mpicc" ./configure 
--with-mpi="$HOME/openmpi_`uname -m`"

And, yes I know I should use the mpicc wrapper and all (I do RTFM :P ) but 
mpptest is less than cooperative and hasn't been updated lately AFAIK.

I'll keep you posted on some results as I get some results out (testing 
TCP/IP as well as the HyperTransport on a Tyan Beast). Up to now, LAM-MPI 
seems less efficient at async communications and shows no improovments 
with persistant communications under TCP/IP. OpenMPI, on the other hand, 
seems more efficient using persistant communications when in a 
HyperTransport (shmem) environment... I know I am crossing many test 
boudaries but I will post some PNGs of my results (as well as how I got to 
them ;)

Eric

On Thu, 15 Feb 2007, Jeff Squyres wrote:

> I think you want to add $HOME/openmpi_`uname -m`/lib to your  
> LD_LIBRARY_PATH.  This should allow executables created by mpicc (or  
> any derivation thereof, such as extracting flags via showme) to find  
> the Right shared libraries.
> 
> Let us know if that works for you.
> 
> FWIW, we do recommend using the wrapper compilers over extracting the  
> flags via --showme whenever possible (it's just simpler and should do  
> what you need).
> 
> 
> On Feb 15, 2007, at 3:38 PM, Eric Thibodeau wrote:
> 
> > Hello all,
> >
> >
> > I have been attempting to compile mpptest on my nodes in vain. Here  
> > is my current setup:
> >
> >
> > Openmpi is in "$HOME/openmpi_`uname -m`" which translates to "/ 
> > export/home/eric/openmpi_i686/". I tried the following approaches  
> > (you can see some of these were out of desperation):
> >
> >
> > CFLAGS=`mpicc --showme:compile` LDFLAGS=`mpicc --showme:link` ./ 
> > configure
> >
> >
> > Configure fails on:
> >
> > checking whether the C compiler works... configure: error: cannot  
> > run C compiled programs.
> >
> >
> > The log shows that:
> >
> > ./a.out: error while loading shared libraries: liborte.so.0: cannot  
> > open shared object file: No such file or directory
> >
> >
> >
> > CC="/export/home/eric/openmpi_i686/bin/mpicc" ./configure --with- 
> > mpi=$HOME/openmpi_`uname -m`
> >
> > Same problems as above...
> >
> >
> > LDFLAGS="$HOME/openmpi_`uname -m`/lib" ./configure --with-mpi=$HOME/ 
> > openmpi_`uname -m`
> >
> >
> > Configure fails on:
> >
> > checking for C compiler default output file name... configure:  
> > error: C compiler cannot create executables
> >
> >
> > And...finally (not that all of this was done in the presented order):
> >
> > ./configure --with-mpi=$HOME/openmpi_`uname -m`
> >
> >
> > Which ends with:
> >
> >
> > checking for library containing MPI_Init... no
> >
> > configure: error: Could not find MPI library
> >
> >
> > Anyone can help me with this one...?
> >
> >
> > Note that LAM-MPI is also installed on these systems...
> >
> >
> > Eric Thibodeau
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 



Re: [OMPI users] compiling mpptest using OpenMPI

2007-02-15 Thread Anthony Chan

As long as mpicc is working, try configuring mpptest as

mpptest/configure MPICC=/bin/mpicc

or

mpptest/configure  --with-mpich=

A.Chan

On Thu, 15 Feb 2007, Eric Thibodeau wrote:

> Hi Jeff,
>
>   Thanks for your response, I eventually figured it out, here is the
> only way I got mpptest to compile:
>
> export LD_LIBRARY_PATH="$HOME/openmpi_`uname -m`/lib"
> CC="$HOME/openmpi_`uname -m`/bin/mpicc" ./configure 
> --with-mpi="$HOME/openmpi_`uname -m`"
>
> And, yes I know I should use the mpicc wrapper and all (I do RTFM :P ) but
> mpptest is less than cooperative and hasn't been updated lately AFAIK.
>
> I'll keep you posted on some results as I get some results out (testing
> TCP/IP as well as the HyperTransport on a Tyan Beast). Up to now, LAM-MPI
> seems less efficient at async communications and shows no improovments
> with persistant communications under TCP/IP. OpenMPI, on the other hand,
> seems more efficient using persistant communications when in a
> HyperTransport (shmem) environment... I know I am crossing many test
> boudaries but I will post some PNGs of my results (as well as how I got to
> them ;)
>
> Eric
>
> On Thu, 15 Feb 2007, Jeff Squyres wrote:
>
> > I think you want to add $HOME/openmpi_`uname -m`/lib to your
> > LD_LIBRARY_PATH.  This should allow executables created by mpicc (or
> > any derivation thereof, such as extracting flags via showme) to find
> > the Right shared libraries.
> >
> > Let us know if that works for you.
> >
> > FWIW, we do recommend using the wrapper compilers over extracting the
> > flags via --showme whenever possible (it's just simpler and should do
> > what you need).
> >
> >
> > On Feb 15, 2007, at 3:38 PM, Eric Thibodeau wrote:
> >
> > > Hello all,
> > >
> > >
> > > I have been attempting to compile mpptest on my nodes in vain. Here
> > > is my current setup:
> > >
> > >
> > > Openmpi is in "$HOME/openmpi_`uname -m`" which translates to "/
> > > export/home/eric/openmpi_i686/". I tried the following approaches
> > > (you can see some of these were out of desperation):
> > >
> > >
> > > CFLAGS=`mpicc --showme:compile` LDFLAGS=`mpicc --showme:link` ./
> > > configure
> > >
> > >
> > > Configure fails on:
> > >
> > > checking whether the C compiler works... configure: error: cannot
> > > run C compiled programs.
> > >
> > >
> > > The log shows that:
> > >
> > > ./a.out: error while loading shared libraries: liborte.so.0: cannot
> > > open shared object file: No such file or directory
> > >
> > >
> > >
> > > CC="/export/home/eric/openmpi_i686/bin/mpicc" ./configure --with-
> > > mpi=$HOME/openmpi_`uname -m`
> > >
> > > Same problems as above...
> > >
> > >
> > > LDFLAGS="$HOME/openmpi_`uname -m`/lib" ./configure --with-mpi=$HOME/
> > > openmpi_`uname -m`
> > >
> > >
> > > Configure fails on:
> > >
> > > checking for C compiler default output file name... configure:
> > > error: C compiler cannot create executables
> > >
> > >
> > > And...finally (not that all of this was done in the presented order):
> > >
> > > ./configure --with-mpi=$HOME/openmpi_`uname -m`
> > >
> > >
> > > Which ends with:
> > >
> > >
> > > checking for library containing MPI_Init... no
> > >
> > > configure: error: Could not find MPI library
> > >
> > >
> > > Anyone can help me with this one...?
> > >
> > >
> > > Note that LAM-MPI is also installed on these systems...
> > >
> > >
> > > Eric Thibodeau
> > >
> > >
> > > ___
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>


Re: [OMPI users] ORTE errors on simple fortran program with 1.2b3

2007-02-15 Thread Steven A. DuChene
Brian:
These are dual proc AMD Opteron systems running RHEL4u2

-Original Message-
>From: Brian Barrett 
>Sent: Feb 15, 2007 4:02 PM
>To: "Steven A. DuChene" , Open MPI Users 
>
>Subject: Re: [OMPI users] ORTE errors on simple fortran program with 1.2b3
>
>What platform / operating system was this with?
>
>Brian
>
>On Feb 15, 2007, at 3:43 PM, Steven A. DuChene wrote:
>
>> I am trying to do some simple fortran MPI examples to verify I have  
>> a good installation
>> of OpenMPI and I have a distributed program that calculates PI. It  
>> seems to compile
>> and work fine with 1.1.4 but whan I compile and run the same  
>> program with 1.2b3
>> I get a bunch of the same ORTE errors and then my shell is locked up:
>>
>> [node001:30268] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate  
>> space in file dss/dss_unpack.c at line 90
>> [node001:30268] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate  
>> space in file gpr_replica_cmd_processor.c at line 361
>>
>> I then do a Ctrl-C and it tells me "mpirun: killing job..." but my  
>> shell never comes back.
>>
>>
>> I do get the following compile time warnings when I build the  
>> simple app with either 1.1.4 or 1.2b3:
>>
>> mpif90  -c -I/opt/openmpi/1.1.4/includepi.f
>> In file pi.f:73
>>
>>   call mpi_reduce(times(1), total, 1, mpi_real,
>>   1
>> In file pi.f:67
>>
>>   call mpi_reduce(piece, pi, 1, mpi_double_precision,
>>   2
>> Warning (155): Inconsistent types (REAL(4)/REAL(8)) in actual  
>> argument lists at (1) and (2)
>> mpif90 -o pi pi.o f.o -L /opt/openmpi/1.1.4/lib -lmpi
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>





Re: [OMPI users] ORTE errors on simple fortran program with 1.2b3

2007-02-15 Thread Steven A. DuChene
Jeff:
I built openmpi-1.2b4r13658 and tried the test again and my example fortran 
program
did indeed work fine with that release.
Thanks

-Original Message-
>From: Jeff Squyres 
>Sent: Feb 15, 2007 4:09 PM
>To: "Steven A. DuChene" , Open MPI Users 
>
>Subject: Re: [OMPI users] ORTE errors on simple fortran program with 1.2b3
>
>On Feb 15, 2007, at 5:43 PM, Steven A. DuChene wrote:
>
>> I am trying to do some simple fortran MPI examples to verify I have  
>> a good installation
>> of OpenMPI and I have a distributed program that calculates PI. It  
>> seems to compile
>> and work fine with 1.1.4 but whan I compile and run the same  
>> program with 1.2b3
>> I get a bunch of the same ORTE errors and then my shell is locked up:
>>
>> [node001:30268] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate  
>> space in file dss/dss_unpack.c at line 90
>> [node001:30268] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate  
>> space in file gpr_replica_cmd_processor.c at line 361
>>
>> I then do a Ctrl-C and it tells me "mpirun: killing job..." but my  
>> shell never comes back.
>
>We had some problems with this in 1.2b3.  I honestly don't remember  
>if we fixed them by 1.2b4 or not -- could you try a recent 1.2  
>nightly snapshot?  they should be fixed there:
>
> http://www.open-mpi.org/nightly/v1.2/
>
>> I do get the following compile time warnings when I build the  
>> simple app with either 1.1.4 or 1.2b3:
>>
>> mpif90  -c -I/opt/openmpi/1.1.4/includepi.f
>> In file pi.f:73
>>
>>   call mpi_reduce(times(1), total, 1, mpi_real,
>>   1
>> In file pi.f:67
>>
>>   call mpi_reduce(piece, pi, 1, mpi_double_precision,
>>   2
>> Warning (155): Inconsistent types (REAL(4)/REAL(8)) in actual  
>> argument lists at (1) and (2)
>> mpif90 -o pi pi.o f.o -L /opt/openmpi/1.1.4/lib -lmpi
>
>I'm not a Fortran expert, but I think that this is the f90 compiling  
>telling you that you have inconsistent types for the first argument  
>of MPI_REDUCE.  This is mainly because there is no equivalent in  
>Fortran to C's (void*) type -- it's the compiler trying to be helpful  
>saying, "Hey, I noticed you have inconsistent types in successive  
>calls to the same function.  Did you really mean to do that?"
>
>For MPI apps using choice buffers (like the first argument in  
>MPI_REDUCE), yes, you did mean to do that -- it's ok.  This is not  
>really an OMPI issue, but rather a Fortran compiler issue.  What you  
>might try is:
>
>- use mpif77 instead (although, depending on your compiler, the  
>result may be exactly the same)
>- poke through your fortran compiler's docs and see if there's a flag  
>that disables this warning
>
>Hope that helps.
>
>-- 
>Jeff Squyres
>Server Virtualization Business Unit
>Cisco Systems
>





Re: [OMPI users] NetPipe benchmark & spanning multiple interconnects

2007-02-15 Thread Galen Shipman


Good point, this may be affecting overall performance for openib+gm.
But I didn't see any performance improvement for gm+tcp over just
using gm (and there's definitely no memory bandwidth limitation
there).


I wouldn't expect you to see any benefit with GM+TCP, the overhead  
costs of TCP are so high that you may end up having a hard time  
keeping up with GM and spending too much time trying to service TCP.



Please correct me if I'm wrong, but it appears that message
striping was implemented primarily having ethernet interfaces in mind.


This is not correct, striping was designed in a network agnostic  
fashion.
It is not optimal but it certainly was not designed primarily for  
ethernet.



 It doesn't seem to have much
impact when combining more "serious" interconnects. If anybody has
tried this before and has evidence to the contrary, I'd love to hear
it.



I guess I'm not sure what defines a "serious" interconnect, if you  
mean interconnects with high bandwidth and low latency then I would  
agree that the impact on measured bandwidth will show a bottleneck  
elsewhere in the system such as memory.



So the "solution" for micro-benchmarks is to register the memory and
leave it registered. Probably the best way to do this is to use
MPI_ALLOC_MEM when allocating memory, this allows us to register the
memory with all the available NICs.

Unfortunately, when it comes to using industry-standard benchmarking,
it's undesirable to modify the source.


No argument here, just pointing out that the high cost of memory  
registration is part of the equation.

You may also try -mca mpi_leave_pinned 1 if you haven't already.
I will be the first to admit however that this is entirely  
artificial, but then again, some would argue that so is NetPipe.





I would also say that this is a very uncommon mode of operation, our
architecture allows it, but certainly isn't optimized for this case.

I suspect, the issue also may be of purely business nature. The
developers of BTL modules for advanced interconnects are most likely
the employees of corresponding companies, which probably do not have
any vested interest in making their interconnects synergistically
coexist with the ones of their competitors or with interconnects the
companies are dropping support for.


This is actually not the case, no interconnect company has (to this  
date) created any BTL although many are now contributing, some to a  
very large extent.
I can assure you that this is in no way an issue of "competitive  
advantage" by intentionally not playing nicely together.
Rather, the real issue is one of time and monkeys, heterogeneous  
multi-nic is not currently at the top of the list!



- Galen




Many thanks,
Alex.




On Feb 12, 2007, at 6:48 PM, Alex Tumanov wrote:

Anyone else who may provide some feedback/comments on this issue?  
How

typical/widespread is the use of multiple interconnects in the HPC
community? Judging from the feedback I'm getting in this thread, it
appears that this is fairly uncommon?

Thanks for your attention to this thread.

Alex.

On 2/8/07, Alex Tumanov  wrote:

Thanks for your insight George.


Strange, the latency is supposed to be there too. Anyway, the
latency
is only used to determine which one is faster, in order to use it
for
small messages.


I searched the code base for mca parameter registering and did  
indeed

discover that latency setting is possible for tcp and tcp alone:
--- 
--

-
[OMPISRCDIR]# grep -r param_register * |egrep -i "latency| 
bandwidth"

ompi/mca/btl/openib/btl_openib_component.c:
mca_btl_openib_param_register_int("bandwidth", "Approximate maximum
bandwidth of interconnect",
ompi/mca/btl/tcp/btl_tcp_component.c:btl->super.btl_bandwidth =
mca_btl_tcp_param_register_int(param, 0);
ompi/mca/btl/tcp/btl_tcp_component.c:btl->super.btl_latency =
mca_btl_tcp_param_register_int(param, 0);
ompi/mca/btl/gm/btl_gm_component.c:
mca_btl_gm_param_register_int("bandwidth", 250);
ompi/mca/btl/mvapi/btl_mvapi_component.c:
mca_btl_mvapi_param_register_int("bandwidth", "Approximate maximum
bandwidth of interconnect",
--- 
--

-
For all others, btl_latency appears to be set to zero when the btl
module gets constructed. Would zero latency prevent message  
striping?


An interesting side-issue that surfaces as a result of this little
investigation is the inconsistency between the ompi_info output and
the actual mca param availability for tcp_latency:

[OMPISRCDIR]# ompi_info --param all all |egrep -i "latency| 
bandwidth"

 MCA btl: parameter "btl_gm_bandwidth" (current
value: "250")
 MCA btl: parameter "btl_mvapi_bandwidth" (current
value: "800")
  Approximate maximum bandwidth of
interconnect
 MCA btl: parameter "btl_openib_bandwi