[OMPI users] Ethernet tuning on Solaris Opteron ?

2006-03-14 Thread Pierre Valiron

I am now attempting to tune openmpi-1.1a1r9260 on Solaris Opteron.

Each quadripro node possess two ethernet interfaces bge0 and bge1.
Interfaces bge0 are dedicated to parallel jobs and correspond to node 
names pxx,

they use a dedicated gigabit switch.
Interfaces bge1 provide nfs sharing etc and correspond to node names nxx 
over another gigabit switch.


1) I allocated 4 quadripro nodes.
As documented in the FAQ, mpirun -np 4 -hostfile $OAR_FILE_NODES runs 4 
tasks on the first SMP, and mpirun -np 4 -hostfile $OAR_FILE_NODES 
--bynode distributes a task on each node.


2) According to the users list, mpirun --mca pml teg should revert to 
2nd generation TCP instead of default ob1 (3rd gen). Unfortunately I get 
the message

No available pml components were found!
Have you removed the 2nd generation TCP transport ? Do you consider the 
new ob1 is competitive now ?


3) According to the users list, tuned collective primitives are 
available. Apparently they are now compiled by default, but the don't 
seem functional at all:


mpirun --mca coll tuned
Signal:11 info.si_errno:0(Error 0) si_code:1(SEGV_MAPERR)
Failing at addr:0
*** End of error message ***

4) According to the FAQ and to the users list, openmpi attempts to 
discover and use all interfaces. I attempted to force using bge0 only 
with no success.


mpirun --mca btl_tcp_if_exclude bge1
[n33:04784] *** An error occurred in MPI_Barrier
[n33:04784] *** on communicator MPI_COMM_WORLD
[n33:04784] *** MPI_ERR_INTERN: internal error
[n33:04784] *** MPI_ERRORS_ARE_FATAL (goodbye)
1 process killed (possibly by Open MPI)

In the FAQ it is stated that a new syntax should be available soon. I 
tried if it is already implemented in openmpi-1.1a1r9260


mpirun --mca btl_tcp_if ^bge0,bge1
mpirun --mca btl_tcp_if ^bge1
works with identical performances.

However I doubt this option is functional, because if I disable all 
ethernet interfaces,

mpirun --mca btl_tcp_if ^bge0,bge1
the job still works!

I would be happy to have more control on the interfaces being used.

What is expected to work on other platforms ?
What could be specific issues to the Solaris Opteron ?

Have a nice openmpi day!

--
Soutenez le mouvement SAUVONS LA RECHERCHE :
http://recherche-en-danger.apinc.org/

   _/_/_/_/_/   _/   Dr. Pierre VALIRON
  _/ _/   _/  _/   Laboratoire d'Astrophysique
 _/ _/   _/ _/Observatoire de Grenoble / UJF
_/_/_/_/_/_/BP 53  F-38041 Grenoble Cedex 9 (France)
   _/  _/   _/http://www-laog.obs.ujf-grenoble.fr/~valiron/
  _/  _/  _/ Mail: pierre.vali...@obs.ujf-grenoble.fr
 _/  _/ _/  Phone: +33 4 7651 4787  Fax: +33 4 7644 8821
_/  _/_/





Re: [OMPI users] MPI_COMM_SPAWN f90 interface bug?

2006-03-14 Thread Jeff Squyres
> -Original Message-
> > [-:13327] mca: base: component_find: unable to open: dlopen(/usr/ 
> > local/lib/openmpi/mca_pml_teg.so, 9): Symbol not found:
> > _mca_ptl_base_recv_request_t_class
> >Referenced from: /usr/local/lib/openmpi/mca_pml_teg.so
> >Expected in: flat namespace
> > (ignored)
> 
> I have determined that the above error/warning is caused by 
> installing opempi1.1r9212 on a machine were openmpi1.0.1 was 
> previously installed.  I had to manually delete the files in 
> /usr/ local/lib/openmpi and then reinstall.  This implies an 
> error with with the 1.1 install script.

Just to clarify on this issue -- Open MPI uses Automake for its installation
/ uninstallation.  As such, it only copies in the files that are relevant to
each version of Open MPI.  It does *not* uninstall any previous versions of
Open MPI.  Specifically, the plugins that are installed between Open MPI
1.0.x and 1.1.x are different.  When you installed Open MPI 1.1.x over the
same tree as 1.0.x, although most of the 1.0.x plugins were overwritten,
some were not (because they only exist in 1.0.x).  At run time, Open MPI
1.1.x tried to open the 1.0.x plugins and resulted in the "symbol not found"
errors that you saw.

So this is actually exactly what the Open MPI installation process is
supposed to do (only touch the files that are relevant to it, not any
others).  We could probably be a bit smarter and not have Open MPI try to
open plugins from earlier versions, but that's a low priority at the moment.

-- 
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



Re: [OMPI users] Ethernet tuning on Solaris Opteron ?

2006-03-14 Thread Brian Barrett

On Mar 14, 2006, at 4:42 AM, Pierre Valiron wrote:


I am now attempting to tune openmpi-1.1a1r9260 on Solaris Opteron.


I guess I should have pointed this out more clearly earlier.  Open  
MPI 1.1a1 is a nightly build of alpha release from our development  
trunk.  It isn't guaranteed to be stable.  About the only guarantee  
made is that it passed "make distcheck" on the Linux box we use to  
make tarballs.


The Solaris patches have been moved over to the v1.0 release branch,  
so if stability is a concern, you might want to switch back to a  
nightly tarball from the v1.0 release.  We should also be having  
another beta release of the 1.0.2 release in the near future.



Each quadripro node possess two ethernet interfaces bge0 and bge1.
Interfaces bge0 are dedicated to parallel jobs and correspond to node
names pxx,
they use a dedicated gigabit switch.
Interfaces bge1 provide nfs sharing etc and correspond to node  
names nxx

over another gigabit switch.

1) I allocated 4 quadripro nodes.
As documented in the FAQ, mpirun -np 4 -hostfile $OAR_FILE_NODES  
runs 4

tasks on the first SMP, and mpirun -np 4 -hostfile $OAR_FILE_NODES
--bynode distributes a task on each node.

2) According to the users list, mpirun --mca pml teg should revert to
2nd generation TCP instead of default ob1 (3rd gen). Unfortunately  
I get

the message
No available pml components were found!
Have you removed the 2nd generation TCP transport ? Do you consider  
the

new ob1 is competitive now ?


On the development trunk, we have removed the TEG PML and all the  
PTLs.  The OB1 PML provides competitive (and most of the time better)  
performance than the TEG PML for most transports.  The major issue is  
that when we added one-sided communication, we used the BTL  
transports directly.  The BTL and PTL frameworks were not designed to  
live together, so issues were caused with the TEG PML.



3) According to the users list, tuned collective primitives are
available. Apparently they are now compiled by default, but the don't
seem functional at all:

mpirun --mca coll tuned
Signal:11 info.si_errno:0(Error 0) si_code:1(SEGV_MAPERR)
Failing at addr:0
*** End of error message ***


Tuned collectives are available, but not as heavily tested as the  
basic collectives.  Do you have a test case in particular that causes  
problems?



4) According to the FAQ and to the users list, openmpi attempts to
discover and use all interfaces. I attempted to force using bge0 only
with no success.

mpirun --mca btl_tcp_if_exclude bge1
[n33:04784] *** An error occurred in MPI_Barrier
[n33:04784] *** on communicator MPI_COMM_WORLD
[n33:04784] *** MPI_ERR_INTERN: internal error
[n33:04784] *** MPI_ERRORS_ARE_FATAL (goodbye)
1 process killed (possibly by Open MPI)


That definitely shouldn't happen - Can you reconfigure / compile with  
the option --enable-debug, then run with the added option --mca  
btl_base_debug 2 and send the output you see to us?  That might help  
in diagnosing the problem.



In the FAQ it is stated that a new syntax should be available soon. I
tried if it is already implemented in openmpi-1.1a1r9260

mpirun --mca btl_tcp_if ^bge0,bge1
mpirun --mca btl_tcp_if ^bge1
works with identical performances.


This syntax only works for specifying component names, not interface  
names.  So you would still need to use the btl_tcp_if_include and  
btl_tcp_if_exclude options.


Brian


--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/




Re: [OMPI users] MPI_COMM_SPAWN f90 interface bug?

2006-03-14 Thread Michael Kluskens
I see responses to noncritical parts of my discussion but not the  
following, is it a known issue, a fixed issue, or we don't want to  
discuss it issue?


Michael

On Mar 7, 2006, at 4:39 PM, Michael Kluskens wrote:


The following errors/warnings also exist when running my spawn test
on a clean installation of r9212.


[-:13323] [0,0,0] ORTE_ERROR_LOG: GPR data corruption in file base/
soh_base_get_proc_soh.c at line 100
[-:13323] [0,0,0] ORTE_ERROR_LOG: GPR data corruption in file base/
oob_base_xcast.c at line 108
[-:13323] [0,0,0] ORTE_ERROR_LOG: GPR data corruption in file base/
rmgr_base_stage_gate.c at line 276
[-:13323] [0,0,0] ORTE_ERROR_LOG: GPR data corruption in file base/
soh_base_get_proc_soh.c at line 100
[-:13323] [0,0,0] ORTE_ERROR_LOG: GPR data corruption in file base/
oob_base_xcast.c at line 108
[-:13323] [0,0,0] ORTE_ERROR_LOG: GPR data corruption in file base/
rmgr_base_stage_gate.c at line 276

OS X 10.4.5 with g95 from current fink install for FC & F77.  Running
on a single machine and launching a single spawned subprocess as a
test case for now.  Also on Debian Sarge on Operton built using "./
configure --with-gnu-ld F77=pgf77 FFLAGS=-fastsse FC=pgf90 FCFLAGS=-
fastsse"  with PG 6.1.

Are these diagnostic messages of errors in OpenMPI 1.1r9212 or
related to errors in my test code?

Is this information helpful for development purposes?


[OMPI users] comm_join and singleton init

2006-03-14 Thread Robert Latham
Hi

I've got a bit of an odd bug here.  I've been playing around with MPI
process management routines and I notied the following behavior with
openmpi-1.0.1:

Two processes (a and b), linked with ompi, but started independently
(no mpiexec, just started the programs directly).

- a and b: call MPI_Init
- a: open a unix network socket on 'fd'
- b: connect to a's socket
- a and b: call MPI_Comm_join over 'fd'
- a and b: call MPI_Intercomm_merge, get intracommunicator.

These steps all work fine. 

Now the odd part: a and b call MPI_Comm_rank and MPI_Comm_size over
the intracommunicator.  Both (correctly) think Comm_size is two, but
both also think (incorrectly) that they are rank 1.  

==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Labs, IL USAB29D F333 664A 4280 315B


[OMPI users] MPI_Comm_connect and singleton init

2006-03-14 Thread Robert Latham
Hello
In playing around with process management routines, I found another
issue.  This one might very well be operator error, or something
implementation specific.

I've got two processes (a and b), linked with openmpi, but started
independently (no mpiexec).  

- A starts up and calls MPI_Init
- A calls MPI_Open_port, prints out the port name to stdout, then
  calls MPI_Comm_accept and blocks.
- B takes as a command line argument the port
  name printed out by A.  It calls MPI_Init and then and passes that
  port name to MPI_Comm_connect
- B gets the following error:

[leela.mcs.anl.gov:04177] [0,0,0] ORTE_ERROR_LOG: Pack data mismatch
in file ../../../orte/dps/dps_unpack.c at line 121
[leela.mcs.anl.gov:04177] [0,0,0] ORTE_ERROR_LOG: Pack data mismatch
in file ../../../orte/dps/dps_unpack.c at line 95
[leela.mcs.anl.gov:04177] *** An error occurred in MPI_Comm_connect
[leela.mcs.anl.gov:04177] *** on communicator MPI_COMM_WORLD
[leela.mcs.anl.gov:04177] *** MPI_ERR_UNKNOWN: unknown error
[leela.mcs.anl.gov:04177] *** MPI_ERRORS_ARE_FATAL (goodbye)
[leela.mcs.anl.gov:04177] [0,0,0] ORTE_ERROR_LOG: Not found in file
../../../../../orte/mca/pls/base/pls_base_proxy.c at line 183

- A is still waiting for someone to connect to it.

Did I pass MPI port strings between programs the correct way, or is
MPI_Publish_name/MPI_Lookup_name the prefered way to pass around this
information?

Thanks
==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Labs, IL USAB29D F333 664A 4280 315B


Re: [OMPI users] comm_join and singleton init

2006-03-14 Thread Edgar Gabriel
could you provide me a simple testcode for that? Comm_join and 
intercomm_merge should work, I would have a look at that...


(separate answer to your second email is coming soon)
Thanks
Edgar

Robert Latham wrote:


Hi

I've got a bit of an odd bug here.  I've been playing around with MPI
process management routines and I notied the following behavior with
openmpi-1.0.1:

Two processes (a and b), linked with ompi, but started independently
(no mpiexec, just started the programs directly).

- a and b: call MPI_Init
- a: open a unix network socket on 'fd'
- b: connect to a's socket
- a and b: call MPI_Comm_join over 'fd'
- a and b: call MPI_Intercomm_merge, get intracommunicator.

These steps all work fine. 


Now the odd part: a and b call MPI_Comm_rank and MPI_Comm_size over
the intracommunicator.  Both (correctly) think Comm_size is two, but
both also think (incorrectly) that they are rank 1.  


==rob





Re: [OMPI users] MPI_Comm_connect and singleton init

2006-03-14 Thread Edgar Gabriel

you are touching here a difficult area in Open MPI:

- name publishing across independent jobs does unfortunatly not work 
right now (It does work, if all processes have been started by the same 
mpirun or if the have been spawned by a father process using 
MPI_Comm_spawn). Your approach with passing the port as a command line 
option should work however.


- you have to start however the orted daemon *before* starting both jobs 
using the flags

' orted --seed --persistent --scope public'
These flags are however currently just lightly tested, since a brand new 
runtime environment with much better support for these operations is 
currently under development.


- regarding the 'pack data mismatch': do both machines which you are 
using have the same data representation? The reason I ask is because 
this looks like a data type mismatch error, and Open MPI currently does 
have some restriction regarding different data formats and endianness...


Thanks
Edgar

Robert Latham wrote:


Hello
In playing around with process management routines, I found another
issue.  This one might very well be operator error, or something
implementation specific.

I've got two processes (a and b), linked with openmpi, but started
independently (no mpiexec).  


- A starts up and calls MPI_Init
- A calls MPI_Open_port, prints out the port name to stdout, then
  calls MPI_Comm_accept and blocks.
- B takes as a command line argument the port
  name printed out by A.  It calls MPI_Init and then and passes that
  port name to MPI_Comm_connect
- B gets the following error:

[leela.mcs.anl.gov:04177] [0,0,0] ORTE_ERROR_LOG: Pack data mismatch
in file ../../../orte/dps/dps_unpack.c at line 121
[leela.mcs.anl.gov:04177] [0,0,0] ORTE_ERROR_LOG: Pack data mismatch
in file ../../../orte/dps/dps_unpack.c at line 95
[leela.mcs.anl.gov:04177] *** An error occurred in MPI_Comm_connect
[leela.mcs.anl.gov:04177] *** on communicator MPI_COMM_WORLD
[leela.mcs.anl.gov:04177] *** MPI_ERR_UNKNOWN: unknown error
[leela.mcs.anl.gov:04177] *** MPI_ERRORS_ARE_FATAL (goodbye)
[leela.mcs.anl.gov:04177] [0,0,0] ORTE_ERROR_LOG: Not found in file
../../../../../orte/mca/pls/base/pls_base_proxy.c at line 183

- A is still waiting for someone to connect to it.

Did I pass MPI port strings between programs the correct way, or is
MPI_Publish_name/MPI_Lookup_name the prefered way to pass around this
information?

Thanks
==rob



--
Edgar Gabriel
Assistant Professor
Department of Computer Science  email:gabr...@cs.uh.edu
University of Houston   http://www.cs.uh.edu/~gabriel
Philip G. Hoffman Hall, Room 524Tel: +1 (713) 743-3857
Houston, TX-77204, USA  Fax: +1 (713) 743-3335


Re: [OMPI users] comm_join and singleton init

2006-03-14 Thread Edgar Gabriel
I think I know what goes wrong. Since they are in different 'universes', 
they will have exactly the same 'Open MPI name', and therefore the 
algorithm in intercomm_merge can not determine which process should be 
first and which is second.


Practically, all jobs which are connected at a certain point in there 
lifetime have to be in the same MPI universe, such that all jobs will 
have different jobid's and therefore different names. To use the same 
universe, you have to start the orted daemon in the persistent mode, so 
the sequence should be:


orted --seed --persistent --scope public
mpirun -np x ./app1
mpirun -np y ./app2

In this case everything should work as expected, you could do the 
comm_join between app1 and app2 and the intercomm_merge should work as well.


Hope this helps
Edgar

Edgar Gabriel wrote:

could you provide me a simple testcode for that? Comm_join and 
intercomm_merge should work, I would have a look at that...


(separate answer to your second email is coming soon)
Thanks
Edgar

Robert Latham wrote:



Hi

I've got a bit of an odd bug here.  I've been playing around with MPI
process management routines and I notied the following behavior with
openmpi-1.0.1:

Two processes (a and b), linked with ompi, but started independently
(no mpiexec, just started the programs directly).

- a and b: call MPI_Init
- a: open a unix network socket on 'fd'
- b: connect to a's socket
- a and b: call MPI_Comm_join over 'fd'
- a and b: call MPI_Intercomm_merge, get intracommunicator.

These steps all work fine. 


Now the odd part: a and b call MPI_Comm_rank and MPI_Comm_size over
the intracommunicator.  Both (correctly) think Comm_size is two, but
both also think (incorrectly) that they are rank 1.  


==rob




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Edgar Gabriel
Assistant Professor
Department of Computer Science  email:gabr...@cs.uh.edu
University of Houston   http://www.cs.uh.edu/~gabriel
Philip G. Hoffman Hall, Room 524Tel: +1 (713) 743-3857
Houston, TX-77204, USA  Fax: +1 (713) 743-3335


Re: [OMPI users] MPI_Comm_connect and singleton init

2006-03-14 Thread Robert Latham
On Tue, Mar 14, 2006 at 12:00:57PM -0600, Edgar Gabriel wrote:
> you are touching here a difficult area in Open MPI:

I don't doubt it.  I haven't found an MPI implementation yet that does
this without any quirks or oddities :>

> - name publishing across independent jobs does unfortunatly not work
>   right now (It does work, if all processes have been started by the
>   same mpirun or if the have been spawned by a father process using
>   MPI_Comm_spawn). Your approach with passing the port as a command
>   line option should work however.
> 
> - you have to start however the orted daemon *before* starting both
>   jobs using the flags ' orted --seed --persistent --scope public'
>   These flags are however currently just lightly tested, since a
>   brand new runtime environment with much better support for these
>   operations is currently under development.

Ok, got it.   If there is some sort of setup before hand (in this
case, lanuching orted), then these independent mpi processes will
have a lot easier time talking to each other.  Makes sense. 

> - regarding the 'pack data mismatch': do both machines which you are 
>   using have the same data representation? The reason I ask is
>   because this looks like a data type mismatch error, and Open MPI
>   currently does have some restriction regarding different data
>   formats and endianness...

I'm just running this on the same machine.

Thanks for the quick response.
==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Labs, IL USAB29D F333 664A 4280 315B