Re: [OMPI users] Issues with DL POLY

2007-06-08 Thread Aaron Thompson
Very interesting, I certainly hope that my problem is this and not  
some kind of error.  I'll put the program on some more nodes and run  
some tests and see what runs fastest.
My only experience so far with MPI is with LAMMPS, and the simulation  
I ran had an almost linear speedup from 1 -> 10 machines (11 -> 1.2  
hours), very satisfying!


Aaron Thompson
Vanderbilt University
aaron.p.thomp...@vanderbilt.edu



On Jun 7, 2007, at 8:44 PM, Brock Palen wrote:


We have a few users using DLPOLY  with OMPI.  Running just fine.
Watch out what kind of simulation you are doing like all MD
software,  not all simulations are better in parallel.  In some the
comunication overhead is much worse than running on just one cpu.  I
see this all the time.  You could try just 2 cpus, on one node some
times that is ok (memory access vs network access)  But its not
uncommon.

Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985


On Jun 7, 2007, at 8:24 PM, Aaron Thompson wrote:


Hello,
Does anyone have experience using DL POLY with OpenMPI?  I've gotten
it to compile, but when I run a simulation using mpirun with two  
dual-

processor machines, it runs a little *slower* than on one CPU on one
machine!  Yet the program is running two instances on each node.  Any
ideas?  The test programs included with OpenMPI show that it is
running correctly across multiple nodes.
Sorry if this is a little off-topic, I wasn't able to find help on
the official DL POLY mailing list.

Thank you!

Aaron Thompson
Vanderbilt University
aaron.p.thomp...@vanderbilt.edu



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] mpirun in openmpi-1.2.2 fails to exit after client program finishes

2007-06-08 Thread Code Master

I compiled openmpi-1.2.2 with:

./configure CFLAGS=-g -pg -O3
--prefix=/home/foo/490_research/490/src/mpi.optimized_profiling/  \
--enable-mpi-threads --enable-progress-threads --enable-static
--disable-shared --without-memory-manager  \
--without-libnuma --disable-mpi-f77 --disable-mpi-f90 --disable-mpi-cxx
--disable-mpi-cxx-seek --disable-dlopen

(Thanks Jeff, now I know that I have to add --without-memory-manager and
--without-libnuma for static linking)

make all

make install

then I run my client app with:

/home/foo/490_research/490/src/mpi.optimized_profiling/bin/mpirun --hostfile
../hostfile -n 32 raytrace -finputs/car.env

The program runs well and each process completes succssfully (I can tell
because all processes have now generated gmon.out successfully and a "ps
aux" on other slave nodes (except the originating node) show that my program
in slave nodes have already exited (not existant).  Therefore I think this
may have something to do with mpirun,which hangs forever.

Can you see anything wrong in my ./configure command which explains the
mpirun hang at the end of the run?  How can I fix it?

Thanks!


Re: [OMPI users] how to identify openmpi in configure script

2007-06-08 Thread Jeff Squyres
Would it be helpful if we provided some way to link in all the MPI  
language bindings?


Examples off the top of my head (haven't thought any of these through):

- mpicxx_all ...
- setenv OMPI_WRAPPER_WANT_ALL_LANGUAGE_BINDINGS
  mpicxx ...
- mpicxx -ompi:all_languages ...


On Jun 6, 2007, at 12:05 PM, Lie-Quan Lee wrote:


Hi Jeff,

Thanks for willing to put more thought on it. Here is my simplified
story. I have an accelerator physics code, Omega3P that is to perform
complex eigenmode analysis. The algorithm for solving eigensystems
makes use of a 3rd-party  sparse direct solver called MUMPS (http://
graal.ens-lyon.fr/MUMPS/). Omega3P is written in C++ with MPI. MUMPS
is written in Fortran 95 with MPI fortran binding. And MUMPS requires
ScaLAPACK and BLACS. (sometime the vendor provides a scientific
library that includes BLACS and ScaLAPACK).  They are both written in
Fortran 77 with MPI Fortran binding.

I often need to compile them in various computer platforms with
different compilers for variety of reasons.
As I mentioned before, I use C++ compiler to link the final
executable. That will require MPI Fortran libraries and general
Fortran libraries.

What I did to solve the above problem is, I have a configure script
in which I will detect the compiler and the platform, based on that I
will add compiler and platform specific flags for the Fortran related
stuff (libraries and library path). This does well until it hit next
new platform/compiler...

Some compilers made the above job slightly easier. For example in
Pathscale compiler collection, it provides -lpathfortran for all what
I need to link the executable using c++ compiler with fortran
compiled libraries. So is IBM visual age compiler set if the wraper
compilers (mpcc_r, mpf90_r) are used. The library name (-lxlf90_r) is
different, though.


best regards,
Rich Lee


On Jun 6, 2007, at 4:16 AM, Jeff Squyres wrote:


On Jun 5, 2007, at 11:17 PM, Lie-Quan Lee wrote:


it is a quite of headache for each compiler/platform to deal with
mixed language
issues.  I have to compile my application on IBM visual age  
compiler,

Pathscale, Cray X1E compiler,
intel/openmpi, intel/mpich, PGI compiler ...
And of course, openmpi 1.1 is different on this comparing with
openmpi 1.2.2 (-lmpi_f77 is new to 1.2.2 version). :-)


You are right. MPI forum most like will not take care of this. I  
just

made a wish ... :-)


Understood; I know it's a pain.  :-(

What I want to understand, however, is what you need to do.  It seems
like your needs are a bit different than those of the mainstream --
is there a way that we can support you directly instead of forcing
you to a) identify openmpi, b) call mpi --showme:link to get the
relevant flags, and c) stitch them together in the manner that you
need?

We take great pains to ensure that the mpi wrapper compilers
"just work" for all the common cases in order to avoid all the "you
must identify which MPI you are using" kinds of games.  Your case
sounds somewhat unusual, but perhaps there's a way we can get the
information to you in a more direct manner...?

--
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] how to identify openmpi in configure script

2007-06-08 Thread Anthony Chan


On Fri, 8 Jun 2007, Jeff Squyres wrote:

> Would it be helpful if we provided some way to link in all the MPI
> language bindings?
>
> Examples off the top of my head (haven't thought any of these through):
>
> - mpicxx_all ...
> - setenv OMPI_WRAPPER_WANT_ALL_LANGUAGE_BINDINGS
>mpicxx ...
> - mpicxx -ompi:all_languages ...
>

Maybe this wrapper should be called "mpild" or "mpilinker".

A.Chan

>
> On Jun 6, 2007, at 12:05 PM, Lie-Quan Lee wrote:
>
> > Hi Jeff,
> >
> > Thanks for willing to put more thought on it. Here is my simplified
> > story. I have an accelerator physics code, Omega3P that is to perform
> > complex eigenmode analysis. The algorithm for solving eigensystems
> > makes use of a 3rd-party  sparse direct solver called MUMPS (http://
> > graal.ens-lyon.fr/MUMPS/). Omega3P is written in C++ with MPI. MUMPS
> > is written in Fortran 95 with MPI fortran binding. And MUMPS requires
> > ScaLAPACK and BLACS. (sometime the vendor provides a scientific
> > library that includes BLACS and ScaLAPACK).  They are both written in
> > Fortran 77 with MPI Fortran binding.
> >
> > I often need to compile them in various computer platforms with
> > different compilers for variety of reasons.
> > As I mentioned before, I use C++ compiler to link the final
> > executable. That will require MPI Fortran libraries and general
> > Fortran libraries.
> >
> > What I did to solve the above problem is, I have a configure script
> > in which I will detect the compiler and the platform, based on that I
> > will add compiler and platform specific flags for the Fortran related
> > stuff (libraries and library path). This does well until it hit next
> > new platform/compiler...
> >
> > Some compilers made the above job slightly easier. For example in
> > Pathscale compiler collection, it provides -lpathfortran for all what
> > I need to link the executable using c++ compiler with fortran
> > compiled libraries. So is IBM visual age compiler set if the wraper
> > compilers (mpcc_r, mpf90_r) are used. The library name (-lxlf90_r) is
> > different, though.
> >
> >
> > best regards,
> > Rich Lee
> >
> >
> > On Jun 6, 2007, at 4:16 AM, Jeff Squyres wrote:
> >
> >> On Jun 5, 2007, at 11:17 PM, Lie-Quan Lee wrote:
> >>
> >>> it is a quite of headache for each compiler/platform to deal with
> >>> mixed language
> >>> issues.  I have to compile my application on IBM visual age
> >>> compiler,
> >>> Pathscale, Cray X1E compiler,
> >>> intel/openmpi, intel/mpich, PGI compiler ...
> >>> And of course, openmpi 1.1 is different on this comparing with
> >>> openmpi 1.2.2 (-lmpi_f77 is new to 1.2.2 version). :-)
> >>
> >>> You are right. MPI forum most like will not take care of this. I
> >>> just
> >>> made a wish ... :-)
> >>
> >> Understood; I know it's a pain.  :-(
> >>
> >> What I want to understand, however, is what you need to do.  It seems
> >> like your needs are a bit different than those of the mainstream --
> >> is there a way that we can support you directly instead of forcing
> >> you to a) identify openmpi, b) call mpi --showme:link to get the
> >> relevant flags, and c) stitch them together in the manner that you
> >> need?
> >>
> >> We take great pains to ensure that the mpi wrapper compilers
> >> "just work" for all the common cases in order to avoid all the "you
> >> must identify which MPI you are using" kinds of games.  Your case
> >> sounds somewhat unusual, but perhaps there's a way we can get the
> >> information to you in a more direct manner...?
> >>
> >> --
> >> Jeff Squyres
> >> Cisco Systems
> >>
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>


[OMPI users] v1.2.2 mca base unable to open pls/ras tm

2007-06-08 Thread Cupp, Matthew R
Hi,

 

I uninstalled and deleted our old installation directories of 1.1.4 and
1.2.1 so I could have it nice and clean for 1.2.2.  I extracted the
source and ran configure with these options:

--prefix=/opt/openmpi/st --with-devel-headers --with-tm=/opt/torque

 

I then build and installed it.  But when I ran a program I got these
messages from each of my processes:

: mca: base: component_find: unable to open pls tm: File
not found (ignored)

: mca: base: component_find: unable to open ras tm: File
not found (ignored)

 

This was the first time that Open MPI was configured with -with-tm as
torque wasn't installed previously.  I found out that torque was not
installed to /opt/torque as it was supposed to be, but to it's default
location.  So I reran the configure without --with-tm and rebuilt and
reinstalled (after uninstalling the previous build).  But I still got
the same messages.

 

So I completely deleted the source directory and destination directory,
extract the source, ran configure, rebuild and installed.  But still the
same errors.  According to the open mpi faq, support for torque must be
explicitly added via configure.
(http://www.open-mpi.org/faq/?category=building#build-rte-tm)  So is it
still including it somehow?

 

Thanks,

Matt

 

__
Matt Cupp
Battelle Memorial Institute
Statistics and Information Analysis



 



Re: [OMPI users] v1.2.2 mca base unable to open pls/ras tm

2007-06-08 Thread George Bosilca
"File not found" is the strerror corresponding to the error we get  
when we call dlopen. So I don't think it's directly related to the  
mca_pls_tm.so library but to one of it's missing dependencies.


Do you have access to the /opt/torque directory on all nodes in your  
cluster ?


  george.

On Jun 8, 2007, at 1:22 PM, Cupp, Matthew R wrote:


Hi,



I uninstalled and deleted our old installation directories of 1.1.4  
and 1.2.1 so I could have it nice and clean for 1.2.2.  I extracted  
the source and ran configure with these options:


--prefix=/opt/openmpi/st --with-devel-headers --with-tm=/opt/torque



I then build and installed it.  But when I ran a program I got  
these messages from each of my processes:


: mca: base: component_find: unable to open pls tm:  
File not found (ignored)


: mca: base: component_find: unable to open ras tm:  
File not found (ignored)




This was the first time that Open MPI was configured with –with-tm  
as torque wasn’t installed previously.  I found out that torque was  
not installed to /opt/torque as it was supposed to be, but to it’s  
default location.  So I reran the configure without --with-tm and  
rebuilt and reinstalled (after uninstalling the previous build).   
But I still got the same messages.




So I completely deleted the source directory and destination  
directory, extract the source, ran configure, rebuild and  
installed.  But still the same errors.  According to the open mpi  
faq, support for torque must be explicitly added via configure.   
(http://www.open-mpi.org/faq/?category=building#build-rte-tm)  So  
is it still including it somehow?




Thanks,

Matt



__
Matt Cupp
Battelle Memorial Institute
Statistics and Information Analysis




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI users] v1.2.2 mca base unable to open pls/ras tm

2007-06-08 Thread Cupp, Matthew R
Yes.  But the /opt/torque directory is just the source, not the actual
installed directory.  The actual installed directory on the head node is
the default location of /usr/lib/something.  And that is not accessable
by every node.

But should it matter if it's not accessable if I don't specify
--with-tm?  I was wondering if ./configure detects torque has been
installed, and then builds the associated components under the
assumption that it's available.

Matt

__
Matt Cupp
Battelle Memorial Institute
Statistics and Information Analysis
614-424-5471

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of George Bosilca
Sent: Friday, June 08, 2007 2:00 PM
To: Open MPI Users
Subject: Re: [OMPI users] v1.2.2 mca base unable to open pls/ras tm

"File not found" is the strerror corresponding to the error we get  
when we call dlopen. So I don't think it's directly related to the  
mca_pls_tm.so library but to one of it's missing dependencies.

Do you have access to the /opt/torque directory on all nodes in your  
cluster ?

   george.

On Jun 8, 2007, at 1:22 PM, Cupp, Matthew R wrote:

> Hi,
>
>
>
> I uninstalled and deleted our old installation directories of 1.1.4  
> and 1.2.1 so I could have it nice and clean for 1.2.2.  I extracted  
> the source and ran configure with these options:
>
> --prefix=/opt/openmpi/st --with-devel-headers --with-tm=/opt/torque
>
>
>
> I then build and installed it.  But when I ran a program I got  
> these messages from each of my processes:
>
> : mca: base: component_find: unable to open pls tm:  
> File not found (ignored)
>
> : mca: base: component_find: unable to open ras tm:  
> File not found (ignored)
>
>
>
> This was the first time that Open MPI was configured with -with-tm  
> as torque wasn't installed previously.  I found out that torque was  
> not installed to /opt/torque as it was supposed to be, but to it's  
> default location.  So I reran the configure without --with-tm and  
> rebuilt and reinstalled (after uninstalling the previous build).   
> But I still got the same messages.
>
>
>
> So I completely deleted the source directory and destination  
> directory, extract the source, ran configure, rebuild and  
> installed.  But still the same errors.  According to the open mpi  
> faq, support for torque must be explicitly added via configure.   
> (http://www.open-mpi.org/faq/?category=building#build-rte-tm)  So  
> is it still including it somehow?
>
>
>
> Thanks,
>
> Matt
>
>
>
> __
> Matt Cupp
> Battelle Memorial Institute
> Statistics and Information Analysis
>
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] v1.2.2 mca base unable to open pls/ras tm

2007-06-08 Thread Jeff Squyres

On Jun 8, 2007, at 2:06 PM, Cupp, Matthew R wrote:


Yes.  But the /opt/torque directory is just the source, not the actual
installed directory.  The actual installed directory on the head  
node is
the default location of /usr/lib/something.  And that is not  
accessable

by every node.

But should it matter if it's not accessable if I don't specify
--with-tm?  I was wondering if ./configure detects torque has been
installed, and then builds the associated components under the
assumption that it's available.


This is what OMPI does.

However, if you only have static libraries for Torque, the issue  
should be moot -- the relevant bits should be statically linked into  
the OMPI tm plugins.  But if your Torque libraries are shared, then  
you do need to have them available on all nodes for OMPI to be able  
to leverage native Torque/TM support.


Make sense?

--
Jeff Squyres
Cisco Systems



Re: [OMPI users] mpirun in openmpi-1.2.2 fails to exit after client program finishes

2007-06-08 Thread Jeff Squyres

On Jun 8, 2007, at 9:29 AM, Code Master wrote:


I compiled openmpi-1.2.2 with:

./configure CFLAGS=-g -pg -O3 --prefix=/home/foo/490_research/490/ 
src/mpi.optimized_profiling/  \
--enable-mpi-threads --enable-progress-threads --enable-static -- 
disable-shared --without-memory-manager  \
--without-libnuma --disable-mpi-f77 --disable-mpi-f90 --disable-mpi- 
cxx --disable-mpi-cxx-seek --disable-dlopen


(Thanks Jeff, now I know that I have to add --without-memory- 
manager and --without-libnuma for static linking)


Good.


make all
make install

then I run my client app with:

/home/foo/490_research/490/src/mpi.optimized_profiling/bin/mpirun -- 
hostfile ../hostfile -n 32 raytrace -finputs/car.env


The program runs well and each process completes succssfully (I can  
tell because all processes have now generated gmon.out successfully  
and a "ps aux" on other slave nodes (except the originating node)  
show that my program in slave nodes have already exited (not  
existant).  Therefore I think this may have something to do with  
mpirun,which hangs forever.


Be aware that you may have problems with multiple processes writing  
to the same gmon.out, unless you're running each instance in a  
different directory (your command line doesn't indicate that you are,  
but that doesn't necessarily prove anything).


Can you see anything wrong in my ./configure command which explains  
the mpirun hang at the end of the run?  How can I fix it?


No, everything looks fine.

So you confirm that all raytrace instances have exited and all orteds  
have exited, leaving *only* mpirun runnning?


There was a race condition about this at one point; Ralph -- can you  
comment further?


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Communication Latency

2007-06-08 Thread Jeff Squyres

The answer is "it depends"; there's a lot of factors involved.

- What is the topology of your network?
- Where do processes land within the topology of the network?
- What interconnect are you using?  (e.g., the openib BTL will  
usually use short message RDMA to a limited set of peers as an  
optimization)

- How long are your messages?

OMPI does not have any special optimizations for point-to-point  
communications for MPI_COMM_WORLD ranks that happen to be powers of  
two.  Other factors may contribute to make that true for your runs,  
but there's nothing hard-coded in Open MPI for that.




On Jun 5, 2007, at 1:10 PM, Andy Georgi wrote:


hi everybody,

i'm new on this list and started using OpenMPI for my parallel  
jobs. first step was to measure the latency for blocking  
communication functions. now my first question: is it possible that  
ordained communication pairs will be optimized?


background:

latency for special processnumbers is nearly 25% smaller, e.g. for  
process 1,2,4,8,16,32,64... (every computer scientist should see  
the pattern ;-)). it doesn't matter from which process i send the  
message if the receiver is one of these processes i have top  
latency values. it's not possible that this effect comes through  
the network because communication from proc5 to proc32 e.g. is  
faster than communication from proc32 to proc5. i've tried it with  
OpenMPI for Intel 1.1.4 and 1.2.2 and OpenMPI for PGI 1.2.2. always  
the same results. now i think it must be a kind of optimization. if  
it's so i would like to know it because then i have an  
explanation ;-).


thx and regards,

andy
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Issues running a basic program with spawn

2007-06-08 Thread Jeff Squyres

On Jun 5, 2007, at 10:27 AM, Prakash Velayutham wrote:


I know. I could not start another client code before this. So just
wanted to check if /bin/hostname works with the spawn.


It will not.  MPI_COMM_SPAWN assumes that you are spawning an MPI  
application and therefore after the process is launched, it tries to  
do MPI-level coordination with it to setup new communicators, etc.   
FWIW: MPI-2 says that you are *only* allowed to launch MPI processes  
through MPI_COMM_SPAWN[_MULTIPLE].


This could well be the error that you are seeing (I haven't tried it  
myself to see what would happen).


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] v1.2.2 mca base unable to open pls/ras tm

2007-06-08 Thread Cupp, Matthew R
So I either have to uninstall torque, make the shared libraries
available on all nodes, or have torque as static libraries on the head
node?

__
Matt Cupp
Battelle Memorial Institute
Statistics and Information Analysis


-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Jeff Squyres
Sent: Friday, June 08, 2007 2:21 PM
To: Open MPI Users
Subject: Re: [OMPI users] v1.2.2 mca base unable to open pls/ras tm

On Jun 8, 2007, at 2:06 PM, Cupp, Matthew R wrote:

> Yes.  But the /opt/torque directory is just the source, not the actual
> installed directory.  The actual installed directory on the head  
> node is
> the default location of /usr/lib/something.  And that is not  
> accessable
> by every node.
>
> But should it matter if it's not accessable if I don't specify
> --with-tm?  I was wondering if ./configure detects torque has been
> installed, and then builds the associated components under the
> assumption that it's available.

This is what OMPI does.

However, if you only have static libraries for Torque, the issue  
should be moot -- the relevant bits should be statically linked into  
the OMPI tm plugins.  But if your Torque libraries are shared, then  
you do need to have them available on all nodes for OMPI to be able  
to leverage native Torque/TM support.

Make sense?

-- 
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] v1.2.2 mca base unable to open pls/ras tm

2007-06-08 Thread Brian Barrett
Or tell Open MPI not to build torque support, which can be done at  
configure time with the --without-tm option.


Open MPI tries to build support for whatever it finds in the default  
search paths, plus whatever things you specify the location of.  Most  
of the time, this is what the user wants.  In this case, however,  
it's not what you wanted so you'll have to add the --without-tm option.


Hope this helps,

Brian


On Jun 8, 2007, at 1:08 PM, Cupp, Matthew R wrote:


So I either have to uninstall torque, make the shared libraries
available on all nodes, or have torque as static libraries on the head
node?

__
Matt Cupp
Battelle Memorial Institute
Statistics and Information Analysis


-Original Message-
From: users-boun...@open-mpi.org [mailto:users-bounces@open- 
mpi.org] On

Behalf Of Jeff Squyres
Sent: Friday, June 08, 2007 2:21 PM
To: Open MPI Users
Subject: Re: [OMPI users] v1.2.2 mca base unable to open pls/ras tm

On Jun 8, 2007, at 2:06 PM, Cupp, Matthew R wrote:

Yes.  But the /opt/torque directory is just the source, not the  
actual

installed directory.  The actual installed directory on the head
node is
the default location of /usr/lib/something.  And that is not
accessable
by every node.

But should it matter if it's not accessable if I don't specify
--with-tm?  I was wondering if ./configure detects torque has been
installed, and then builds the associated components under the
assumption that it's available.


This is what OMPI does.

However, if you only have static libraries for Torque, the issue
should be moot -- the relevant bits should be statically linked into
the OMPI tm plugins.  But if your Torque libraries are shared, then
you do need to have them available on all nodes for OMPI to be able
to leverage native Torque/TM support.

Make sense?

--
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Issues running a basic program with spawn

2007-06-08 Thread Ralph Castain
My apologies - Prakash and I solved this off-list. I should have posted the
final solution here too so any interested parties would know the answer.

The problem actually is a bug that broke comm_spawn in 1.2.2 and may well be
present in the entire 1.2 code series (I have not checked the prior
sub-releases). I provided a patch to Prakash that solves the problem, and
have requested that a slightly different version be released as part of
1.2.3.

Sorry for forgetting to post this back to the list. Anyone needing the patch
for 1.2.2 prior to the next sub-release should just let me know and I'll
provide it.

Ralph



On 6/8/07 12:39 PM, "Jeff Squyres"  wrote:

> On Jun 5, 2007, at 10:27 AM, Prakash Velayutham wrote:
> 
>> I know. I could not start another client code before this. So just
>> wanted to check if /bin/hostname works with the spawn.
> 
> It will not.  MPI_COMM_SPAWN assumes that you are spawning an MPI
> application and therefore after the process is launched, it tries to
> do MPI-level coordination with it to setup new communicators, etc.
> FWIW: MPI-2 says that you are *only* allowed to launch MPI processes
> through MPI_COMM_SPAWN[_MULTIPLE].
> 
> This could well be the error that you are seeing (I haven't tried it
> myself to see what would happen).




Re: [OMPI users] mixing MX and TCP

2007-06-08 Thread George Bosilca
A fix for this problem is now available on the trunk. Please use any  
revision after 14963 and your problem will vanish [I hope!]. There  
are now some additional parameters which allow you to select which  
Myrinet network you want to use in the case there are several  
available (--mca btl_mx_if_include and --mca btl_mx_if_exclude). Even  
multi-rails should now work over MX.


  george.

On May 31, 2007, at 12:09 PM, Kees Verstoep wrote:


Hi,

I am currently experimenting with OpenMPI in a multi-cluster setting
where each cluster has its private Myri-10G/MX network besides TCP.
Somehow I was under the assumption that OpenMPI would dynamically find
out the details of this configuration, and use MX where possible
(i.e., intra cluster), and TCP elsewhere.
But from some initial testing it appears OpenMPI-1.2.1 assumes global
connectivity over MX when every particpating host supports MX.
I see MX rather than tcp-level connections between clusters being
tried, which across clusters fails in mx_connect/mx_isend (at the
moment there is no inter-cluster support in MX itself).  Besides "mx",
I do include "tcp" in the network option lists of course.

Is this just something that is not yet supported in the current
release, or does it work by providing some extra parameters?
I have not started digging in the code yet.

Thanks!
Kees Verstoep
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI users] mpirun in openmpi-1.2.2 fails to exit after client program finishes

2007-06-08 Thread Code Master

On 6/9/07, Jeff Squyres  wrote:

On Jun 8, 2007, at 9:29 AM, Code Master wrote:

> I compiled openmpi-1.2.2 with:
>
> ./configure CFLAGS=-g -pg -O3 --prefix=/home/foo/490_research/490/
> src/mpi.optimized_profiling/  \
> --enable-mpi-threads --enable-progress-threads --enable-static --
> disable-shared --without-memory-manager  \
> --without-libnuma --disable-mpi-f77 --disable-mpi-f90 --disable-mpi-
> cxx --disable-mpi-cxx-seek --disable-dlopen
>
> (Thanks Jeff, now I know that I have to add --without-memory-
> manager and --without-libnuma for static linking)

Good.

> make all
> make install
>
> then I run my client app with:
>
> /home/foo/490_research/490/src/mpi.optimized_profiling/bin/mpirun --
> hostfile ../hostfile -n 32 raytrace -finputs/car.env
>
> The program runs well and each process completes succssfully (I can
> tell because all processes have now generated gmon.out successfully
> and a "ps aux" on other slave nodes (except the originating node)
> show that my program in slave nodes have already exited (not
> existant).  Therefore I think this may have something to do with
> mpirun,which hangs forever.

Be aware that you may have problems with multiple processes writing
to the same gmon.out, unless you're running each instance in a
different directory (your command line doesn't indicate that you are,
but that doesn't necessarily prove anything).


I am sure this is not happening, because  in my program, after the MPI
initialization, the main() invokes chdir() which immediately change
the directory to the process's own directory (named after the
proc_id).  Therefore they all have their own directory to write to.


> Can you see anything wrong in my ./configure command which explains
> the mpirun hang at the end of the run?  How can I fix it?

No, everything looks fine.

So you confirm that all raytrace instances have exited and all orteds
have exited, leaving *only* mpirun runnning?


Yes, I am sure that all raytrace instances as well as all mpi-related
processes (including mpirun and orteds etc.) have exited in all slave
nodes.  In the *master* node, all raytrace instances and all orteds
have exited as well, leaving *only* mpirun running in the *master*
node.

14818 pts/0S+ 0:00
/home/foo/490_research/490/src/mpi.optimized_profiling/bin/mpirun
--hostfile ../hostfile -n 32 raytrace -finputs/car.env -s
1

There was a race condition about this at one point; Ralph -- can you
comment further?

--
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users