Re: [OMPI users] Problem with openmpi and infiniband

2008-12-25 Thread Jeff Squyres
Another thing to try is a change that we made late in the Open MPI  
v1.2 series with regards to IB:


http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion



On Dec 24, 2008, at 10:07 PM, Tim Mattox wrote:


For your runs with Open MPI over InfiniBand, try using openib,sm,self
for the BTL setting, so that shared memory communications are used
within a node.  It would give us another datapoint to help diagnose
the problem.  As for other things we would need to help diagnose the
problem, please follow the advice on this FAQ entry, and the help  
page:

http://www.open-mpi.org/faq/?category=openfabrics#ofa-troubleshoot
http://www.open-mpi.org/community/help/

On Wed, Dec 24, 2008 at 5:55 AM, Biagio Lucini  
 wrote:

Pavel Shamis (Pasha) wrote:


Biagio Lucini wrote:


Hello,

I am new to this list, where I hope to find a solution for a  
problem

that I have been having for quite a longtime.

I run various versions of openmpi (from 1.1.2 to 1.2.8) on a  
cluster

with Infiniband interconnects that I use and administer at the same
time. The openfabric stac is OFED-1.2.5, the compilers gcc 4.2 and
Intel. The queue manager is SGE 6.0u8.


Do you use OpenMPI version that is included in OFED ? Did you was  
able

to run basic OFED/OMPI tests/benchmarks between two nodes ?



Hi,

yes to both questions: the OMPI version is the one that comes with  
OFED
(1.1.2-1) and the basic tests run fine. For instance, IMB-MPI1  
(which is

more than basic, as far as I can see) reports for the last test:

#---
# Benchmarking Barrier
# #processes = 6
#---
#repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
   100022.9322.9522.94


for the openib,self btl (6 processes, all processes on different  
nodes)

and

#---
# Benchmarking Barrier
# #processes = 6
#---
#repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
   1000   191.30   191.42   191.34

for the tcp,self btl (same test)

No anomalies for other tests (ping-pong, all-to-all etc.)

Thanks,
Biagio


--
=

Dr. Biagio Lucini
Department of Physics, Swansea University
Singleton Park, SA2 8PP Swansea (UK)
Tel. +44 (0)1792 602284

=
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





--
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
tmat...@gmail.com || timat...@open-mpi.org
   I'm a bright... http://www.the-brights.net/
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] sending message to the source(0) from other processors

2008-12-25 Thread Jeff Squyres
FWIW: you might want to take an MPI tutorial; they're really helpful  
for learning MPI's capabilities and how to use the primitives.  The  
NCSA has 2 excellent MPI tutorials (intro and advanced); they both  
require free registration:


http://ci-tutor.ncsa.uiuc.edu/login.php


On Dec 24, 2008, at 10:52 PM, Win Than Aung wrote:

I got the solution. I just need to set the appropriate tag to send  
and receive.

sorry for asking
thanks
winthan

On Wed, Dec 24, 2008 at 10:36 PM, Win Than Aung   
wrote:

thanks Eugene for your example, it helps me a lot.
I bump into one more problems
lets say I have the file content as follow
I have total of six files which all contain real and imaginary value.
"
1.001212 1.0012121  //0th
1.001212 1.0012121  //1st
1.001212 1.0012121  //2nd
1.001212 1.0012121  //3rd
1.001212 1.0012121  //4th
1.001212 1.0012121 //5th
1.001212 1.0012121 //6th
"
char send_buffer[1000];
i use "mpirun -np 6 a.out" which mean i let each processor get  
access to one file
each processor will add "0th and 2nd"(even values) (those values  
will be sent to root processor and save as file_even_add.dat" and  
also each processor will add "1st and 3rd"(odd values) (those values  
will be sent to root processor(here is 0) and saved as  
"file_odd_add.dat".


char recv_buffer[1000];
File* file_even_dat;
File* file_odd_dat;
if(mpi_my_id == root)
{
   filepteven = fopen("C:\\fileeven.dat");
   fileptodd = fopen("C:\\fileodd.dat");
 int peer =0;
for(peer =0;peer
MPI_Recv 
(recv_buffer,MAX_STR_LEN,MPI_BYTE,MPI_ANY_TAG,MPI_COMM_WORLD,&status);

  }
  fprintf(filepteven, "%s \n" ,recv_buffer);
   }
}

My question is how do i know which sentbuffer has even add values  
and which sentbuffer has odd add values? in which order did they get  
sent?

thanks
winthan

On Tue, Dec 23, 2008 at 3:53 PM, Eugene Loh   
wrote:

Win Than Aung wrote:


thanks for your reply jeff

so i tried following



#include 
#include 

int main(int argc, char **argv) {
 int np, me, sbuf = -1, rbuf = -2,mbuf=1000;
int data[2];
 MPI_Init(&argc,&argv);
 MPI_Comm_size(MPI_COMM_WORLD,&np);
 MPI_Comm_rank(MPI_COMM_WORLD,&me);
 if ( np < 2 ) MPI_Abort(MPI_COMM_WORLD,-1);

 if ( me == 1 ) MPI_Send(&sbuf,1,MPI_INT,0,344,MPI_COMM_WORLD);
if(me==2) MPI_Send( &mbuf,1,MPI_INT,0,344,MPI_COMM_WORLD);
if ( me == 0 ) {
  MPI_Recv(data,2,MPI_INT,MPI_ANY_SOURCE, 
344,MPI_COMM_WORLD,MPI_STATUS_IGNORE);

 }

 MPI_Finalize();

 return 0;
}

it can successfuly receive the one sent from processor 1(me==1) but  
it failed to receive the one sent from processor 2(me==2)

mpirun -np 3 hello
There is only one receive, so it receives only one message.  When  
you specify the element count for the receive, you're only  
specifying the size of the buffer into which the message will be  
received.  Only after the message has been received can you inquire  
how big the message actually was.


Here is an example:


% cat a.c
#include 
#include 

int main(int argc, char **argv) {
  int np, me, peer, value;


  MPI_Init(&argc,&argv);
  MPI_Comm_size(MPI_COMM_WORLD,&np);
  MPI_Comm_rank(MPI_COMM_WORLD,&me);

  value = me * me + 1;
  if ( me == 0 ) {
for ( peer = 0; peer < np; peer++ ) {
  if ( peer != 0 ) MPI_Recv(&value,1,MPI_INT,peer, 
343,MPI_COMM_WORLD,MPI_STATUS_IGNORE);

  printf("peer %d had value %d\n", peer, value);
}
  }
  else MPI_Send(&value,1,MPI_INT,0,343,MPI_COMM_WORLD);

  MPI_Finalize();

  return 0;
}
% mpirun -np 3 a.out
peer 0 had value 1
peer 1 had value 2
peer 2 had value 5
%

Alternatively,


#include 
#include 

#define MAXNP 1024

int main(int argc, char **argv) {
  int np, me, peer, value, values[MAXNP];


  MPI_Init(&argc,&argv);
  MPI_Comm_size(MPI_COMM_WORLD,&np);
  if ( np > MAXNP ) MPI_Abort(MPI_COMM_WORLD,-1);

  MPI_Comm_rank(MPI_COMM_WORLD,&me);
  value = me * me + 1;

  MPI_Gather(&value, 1, MPI_INT,
 values, 1, MPI_INT, 0, MPI_COMM_WORLD);

  if ( me == 0 )
for ( peer = 0; peer < np; peer++ )
  printf("peer %d had value %d\n", peer, values[peer]);

  MPI_Finalize();
  return 0;
}
% mpirun -np 3 a.out
peer 0 had value 1
peer 1 had value 2
peer 2 had value 5
%

Which is better?  Up to you.  The collective routines (like  
MPI_Gather) do offer MPI implementors (like people developing Open  
MPI) the opportunity to perform special optimizations (e.g., gather  
using a binary tree instead of having the root process perform so  
many receives).


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Relocating an Open MPI installation using OPAL_PREFIX

2008-12-25 Thread Jeff Squyres
It's quite possible that we don't handle this situation properly.   
Won't you need to libdir's (one for the 32 bit OMPI executables, and  
one for the 64 bit MPI apps)?


On Dec 23, 2008, at 3:58 PM, Ethan Mallove wrote:


I think the problem is that I am doing a multi-lib build. I have
32-bit libraries in lib/, and 64-bit libraries in lib/64. I assume I
do not see the issue for 32-bit tests, because all the dependencies
are where Open MPI expects them to be. For the 64-bit case, I tried
setting OPAL_LIBDIR to /opt/openmpi-relocated/lib/lib64, but no luck.
Given the below configure arguments, what do my OPAL_* env vars need
to be? (Also, could using --enable-orterun-prefix-by-default interfere
with OPAL_PREFIX?)

   $ ./configure CC=cc CXX=CC F77=f77 FC=f90  --with-openib -- 
without-udapl --disable-openib-ibcm --enable-heterogeneous --enable- 
cxx-exceptions --enable-shared --enable-orterun-prefix-by-default -- 
with-sge --enable-mpi-f90 --with-mpi-f90-size=small --disable-mpi- 
threads --disable-progress-threads   --disable-debug  CFLAGS="-m32 - 
xO5" CXXFLAGS="-m32 -xO5" FFLAGS="-m32 -xO5"  FCFLAGS="-m32 -xO5" -- 
prefix=/workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ompi- 
tarball-testing/installs/DGQx/install --mandir=/workspace/em162155/ 
hpc/mtt-scratch/burl-ct-v20z-12/ompi-tarball-testing/installs/DGQx/ 
install/man --libdir=/workspace/em162155/hpc/mtt-scratch/burl-ct- 
v20z-12/ompi-tarball-testing/installs/DGQx/install/lib --includedir=/ 
workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ompi-tarball- 
testing/installs/DGQx/install/include --without-mx --with-tm=/ws/ 
ompi-tools/orte/torque/current/shared-install32 --with-contrib-vt- 
flags="--prefix=/workspace/em162155/hpc/mtt-scratch/burl-ct-v!
20z-12/ompi-tarball-testing/installs/DGQx/install --mandir=/ 
workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ompi-tarball- 
testing/installs/DGQx/install/man --libdir=/workspace/em162155/hpc/ 
mtt-scratch/burl-ct-v20z-12/ompi-tarball-testing/installs/DGQx/ 
install/lib --includedir=/workspace/em162155/hpc/mtt-scratch/burl-ct- 
v20z-12/ompi-tarball-testing/installs/DGQx/install/include LDFLAGS=- 
R/workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ompi-tarball- 
testing/installs/DGQx/install/lib"


   $ ./confgiure CC=cc CXX=CC F77=f77 FC=f90  --with-openib -- 
without-udapl --disable-openib-ibcm --enable-heterogeneous --enable- 
cxx-exceptions --enable-shared --enable-orterun-prefix-by-default -- 
with-sge --enable-mpi-f90 --with-mpi-f90-size=small --disable-mpi- 
threads --disable-progress-threads   --disable-debug  CFLAGS="-m64 - 
xO5" CXXFLAGS="-m64 -xO5" FFLAGS="-m64 -xO5"  FCFLAGS="-m64 -xO5" -- 
prefix=/workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ompi- 
tarball-testing/installs/DGQx/install --mandir=/workspace/em162155/ 
hpc/mtt-scratch/burl-ct-v20z-12/ompi-tarball-testing/installs/DGQx/ 
install/man --libdir=/workspace/em162155/hpc/mtt-scratch/burl-ct- 
v20z-12/ompi-tarball-testing/installs/DGQx/install/lib/lib64 -- 
includedir=/workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ompi- 
tarball-testing/installs/DGQx/install/include/64 --without-mx --with- 
tm=/ws/ompi-tools/orte/torque/current/shared-install64 --with- 
contrib-vt-flags="--prefix=/workspace/em162155/hpc/mtt-scratch/!
burl-ct-v20z-12/ompi-tarball-testing/installs/DGQx/install --mandir=/ 
workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ompi-tarball- 
testing/installs/DGQx/install/man --libdir=/workspace/em162155/hpc/ 
mtt-scratch/burl-ct-v20z-12/ompi-tarball-testing/installs/DGQx/ 
install/lib/lib64 --includedir=/workspace/em162155/hpc/mtt-scratch/ 
burl-ct-v20z-12/ompi-tarball-testing/installs/DGQx/install/include/ 
64 LDFLAGS=-R/workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ 
ompi-tarball-testing/installs/DGQx/install/lib" --disable-binaries


-Ethan




On Dec 22, 2008, at 12:42 PM, Ethan Mallove wrote:


Can anyone get OPAL_PREFIX to work on Linux? A simple test is to see
if the following works for any mpicc/mpirun:

$ mv  /tmp/foo
$ set OPAL_PREFIX /tmp/foo
$ mpicc ...
$ mpirun ...

If you are able to get the above to run successfully, I'm interested
in your config.log file.

Thanks,
Ethan


On Thu, Dec/18/2008 11:03:25AM, Ethan Mallove wrote:

Hello,

The below FAQ lists instructions on how to use a relocated Open MPI
installation:

http://www.open-mpi.org/faq/?category=building#installdirs

On Solaris, OPAL_PREFIX and friends (documented in the FAQ) work  
for
me with both MPI (hello_c) and non-MPI (hostname) programs. On  
Linux,

I can only get the non-MPI case to work. Here are the environment
variables I am setting:

$ cat setenv_opal_prefix.csh
set opal_prefix = "/opt/openmpi-relocated"

setenv OPAL_PREFIX $opal_prefix
setenv OPAL_BINDIR $opal_prefix/bin
setenv OPAL_SBINDIR$opal_prefix/sbin
setenv OPAL_DATAROOTDIR$opal_prefix/share
setenv OPAL_SYSCONFDIR $opal_prefix/etc
setenv OPAL_SHAREDSTATEDIR $opal_prefix/com
setenv OPAL_LOCALSTATEDIR  $opal_prefix/var
setenv O