[OMPI users] Spawn and distribution of slaves

2006-03-02 Thread Jean Latour

Hello,

Testing the MPI_Comm_Spawn function of Open MPI version 1.0.1, I have an 
example that works OK,
except that it shows that the spawned processes do not follow the 
"machinefile" setting of processors.
In this example a master process spawns first 2 processes, then 
disconnects from them and spawn 2 more
processes. Running on a Quad Opteron node, all processes are running on 
the same node, although the

machinefile specifies that the slaves should run on different nodes.

With the actual version of OpenMPI is it possible to direct the spawned 
processes on
a specific node ? (the node distribution could be given in the 
"machinefile" file, as with LAM MPI)


The code (Fortran 90) of this example and makefile is attached as a tar 
file.


Thank you very much

Jean Latour




spawn+connect.tar.gz
Description: Binary data
<>

Re: [OMPI users] Spawn and distribution of slaves

2006-03-03 Thread Jean Latour
Thanks for your answer. Your example address one possible situation 
where a parallel
application is spawned by a driver with MPI_Comm_Spawn, or multiple 
parallel applications
are spawned at the same time with a MPI_Comm_Span_Multiple, over a set 
of processors
described in the machinefile. It is OK if the next spawn occurs after 
some processes at the

beginning of the machinefile have stopped.
However I have in hands another case where the spawn processes are 
really dynamic over
time. Any child processes can stop (not necessarily the first in the 
machinefile), and thus
they are freeing some processors on which the new spawned processes must 
be running.
With LAM_MPI this situation has a satisfactory solution with the INFO 
parameter of the
MPI_Comm_Spawn. It allows to specify a "local" machinefile for these 
spawned processes,
instead of taking always the same machinefile from the beginning as in 
your example.


Do you know if this specific feature will be implemented in Open-MPI (I 
hope it will be),

and possibly when ?
Dynamic applications really need this.

Best Regards,
Jean Latour

Edgar Gabriel wrote:


so for my tests, Open MPI did follow the machinefile (see output)
further below, however, for each spawn operation it starts from the very
beginning of the machinefile...

The following example spawns 5 child processes (with a single
MPI_Comm_spawn), and each child prints its rank and the hostname.

gabriel@linux12 ~/dyncomm $ mpirun -hostfile machinefile  -np 3
./dyncomm_spawn_father
 Checking for MPI_Comm_spawn.working
Hello world from child 0 on host linux12
Hello world from child 1 on host linux13
Hello world from child 3 on host linux15
Hello world from child 4 on host linux16
 Testing Send/Recv on the intercomm..working
Hello world from child 2 on host linux14


with the machinefile being:
gabriel@linux12 ~/dyncomm $ cat machinefile
linux12
linux13
linux14
linux15
linux16

In your code, you always spawn 1 process at the time, and that's why 
they are all located on the same node.


Hope this helps...
Edgar


Edgar Gabriel wrote:

 

as far as I know, Open MPI should follow the machinefile for spawn 
operations, starting however for every spawn at the beginning of the 
machinefile again. An info object such as 'lam_sched_round_robin' is 
currently not available/implemented. Let me look into this...


Jean Latour wrote:


   


Hello,

Testing the MPI_Comm_Spawn function of Open MPI version 1.0.1, I have an 
example that works OK,
except that it shows that the spawned processes do not follow the 
"machinefile" setting of processors.
In this example a master process spawns first 2 processes, then 
disconnects from them and spawn 2 more
processes. Running on a Quad Opteron node, all processes are running on 
the same node, although the

machinefile specifies that the slaves should run on different nodes.

With the actual version of OpenMPI is it possible to direct the spawned 
processes on
a specific node ? (the node distribution could be given in the 
"machinefile" file, as with LAM MPI)


The code (Fortran 90) of this example and makefile is attached as a tar 
file.


Thank you very much

Jean Latour


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
 


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
   



 



<>

Re: [OMPI users] Spawn and Disconnect

2006-03-03 Thread Jean Latour

Just to add an example that may help  to this "disconnect" discussion :
Attached is the code of a test that does the following (and it works 
perfectly with OpenMPI 1.0.1)


1) master spawns slave1
2) master spawns slave2
3) exechange messages between master and slaves over intercommunicator
4) slave1 disconnects from master and finalize
5) slave2 disconnects from master and finalize
(the processors used by slave 1 and slave 2 can now be re-used by new 
spawned processes)

6) master spawns slave3, and then slave4
7) slave3 and slave4 have NO direct communicator, but they can create 
one through the Open-Port

mechanism and the MPI_Connect / MPI_Accept functions.
The port number is relayed through the master.
8) slave3 and slave4 create this direct communicator and do some 
pingpong over it

9) slave3 and slave4 disconnect from each other on this direct communicator
10) slave3 and slave4 disconnect from master an finalize
11) master finalize

Hope it helps
Best regards,
Jean Latour

Ralph Castain wrote:

We expect to have much better support for the entire comm_spawn 
process in the next incarnation of the RTE. I don't expect that to be 
included in a release, however, until 1.1 (Jeff may be able to give 
you an estimate for when that will happen).


Jeff et al may be able to give you access to an early non-release 
version sooner, if better comm_spawn support is a critical issue and 
you don't mind being patient with the inevitable bugs in such versions.


Ralph


Edgar Gabriel wrote:

Open MPI currently does not fully support a proper disconnection of 
parent and child processes. Thus, if a child dies/aborts, the parents 
will abort as well, despite of calling MPI_Comm_disconnect. (The new RTE 
will have better support for these operations, Ralph/Jeff can probably 
give a better estimate when this will be available.)


However, what should not happen is, that if the child calls MPI_Finalize 
(so not a violent death but a proper shutdown), the parent goes down at 
the same time. Let me check that as well...


Brignone, Sergio wrote:

 


Hi everybody,



I am trying to run a master/slave set.

Because of the nature of the problem I need to start and stop (kill) 
some slaves.


The problem is that as soon as one of the slave dies, the master dies also.



This is what I am doing:



MASTER:



MPI_Init(...)



MPI_Comm_spawn(slave1,...,nslave1,...,intercomm1);



MPI_Barrier(intercomm1);



MPI_Comm_disconnect(&intercomm1);



MPI_Comm_spawn(slave2,...,nslave2,...,intercomm2);



MPI_Barrier(intercomm2);



MPI_Comm_disconnect(&intercomm2);



MPI_Finalize();











SLAVE:



MPI_Init(...)



MPI_Comm_get_parent(&intercomm);



(does something)



MPI_Barrier(intercomm);



MPI_Comm_disconnect(&intercomm);



MPI_Finalize();







The issue is that as soon as the first set of slaves calls MPI_Finalize, 
the master dies also (it dies right after MPI_Comm_disconnect(&intercomm1) )






What am I doing wrong?



Thanks



Sergio








___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
   




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

 




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





spawn+connect.tar.gz
Description: Binary data
<>

[OMPI users] Performance of ping-pong using OpenMPI over Infiniband

2006-03-16 Thread Jean Latour

Hello,

Testing performance of OpenMPI over Infiniband I have the following 
result :


1) Hardware is  : SilversStorm interface

2) Openmpi version is : (from ompi_info)
  Open MPI: 1.0.2a9r9159
  Open MPI SVN revision: r9159
   Open RTE: 1.0.2a9r9159
  Open RTE SVN revision: r9159
   OPAL: 1.0.2a9r9159
  OPAL SVN revision: r9159

3) Cluster with Bi-processors Opteron 248   2.2 GHz

Configure has been run with option --with-mvapi=path-to-mvapi

4) a C coded pinpong gives the following values :

LOOPS: 1000 BYTES: 4096 SECONDS: 0.085557  MBytes/sec: 95.749051
LOOPS: 1000 BYTES: 8192 SECONDS: 0.050657  MBytes/sec: 323.429912
LOOPS: 1000 BYTES: 16384 SECONDS: 0.084038  MBytes/sec: 389.918757
LOOPS: 1000 BYTES: 32768 SECONDS: 0.163161  MBytes/sec: 401.665104
LOOPS: 1000 BYTES: 65536 SECONDS: 0.306694  MBytes/sec: 427.370561
LOOPS: 1000 BYTES: 131072 SECONDS: 0.529589  MBytes/sec: 494.995011
LOOPS: 1000 BYTES: 262144 SECONDS: 0.952616  MBytes/sec: 550.366583
LOOPS: 1000 BYTES: 524288 SECONDS: 1.927987  MBytes/sec: 543.870859
LOOPS: 1000 BYTES: 1048576 SECONDS: 3.673732  MBytes/sec: 570.850562
LOOPS: 1000 BYTES: 2097152 SECONDS: 9.993185  MBytes/sec: 419.716435
LOOPS: 1000 BYTES: 4194304 SECONDS: 18.211958  MBytes/sec: 460.609893
LOOPS: 1000 BYTES: 8388608 SECONDS: 35.421490  MBytes/sec: 473.645124

My questions are :
a)  Is OpenMPI doing in this case TCP/IP over IB ? (I guess so)
b) Is it possible to improve significantly these values by changing the 
defaults ?


   I have used several mca btl parameters but without improving the 
maximum bandwith.

  For example :  --mca btl mvapi   --mca btl_mvapi_max_send_size 8388608

c) Is it possible that other IB hardware implementations  have better
   performances with OpenMPI ?

d) Is it possible to use specific IB drivers  for optimal performance  ? 
(should reach almost 800 MB/sec)


Thank you very much for your help
Best Regards,
Jean Latour

<>

Re: [OMPI users] Performance of ping-pong using OpenMPI over Infiniband

2006-03-17 Thread Jean Latour

Following your advices and those in the FAQ pages,
I have added the file
$(HOME)/.openmpi/mca-params.conf
with :

btl_mvapi_flags=6
mpi_leave_pinned=1
pml_ob1_leave_pinned_pipeline=1
mpool_base_use_mem_hooks=1

The parameterbtl_mvapi_eager_limit  gives the best results, when set 
to 8 K or 16 K.

The pingpong test result  is now :

LOOPS: 1000 BYTES: 4096 SECONDS: 0.085643  MBytes/sec: 95.652825
LOOPS: 1000 BYTES: 8192 SECONDS: 0.050893  MBytes/sec: 321.931400
LOOPS: 1000 BYTES: 16384 SECONDS: 0.106791  MBytes/sec: 306.842281
LOOPS: 1000 BYTES: 32768 SECONDS: 0.154873  MBytes/sec: 423.159259
LOOPS: 1000 BYTES: 65536 SECONDS: 0.250849  MBytes/sec: 522.513526
LOOPS: 1000 BYTES: 131072 SECONDS: 0.443162  MBytes/sec: 591.530910
LOOPS: 1000 BYTES: 262144 SECONDS: 0.827640  MBytes/sec: 633.473448
LOOPS: 1000 BYTES: 524288 SECONDS: 1.596701  MBytes/sec: 656.714101
LOOPS: 1000 BYTES: 1048576 SECONDS: 3.134974  MBytes/sec: 668.953554
LOOPS: 1000 BYTES: 2097152 SECONDS: 6.210786  MBytes/sec: 675.325785
LOOPS: 1000 BYTES: 4194304 SECONDS: 12.384103  MBytes/sec: 677.369053
LOOPS: 1000 BYTES: 8388608 SECONDS: 27.377714  MBytes/sec: 612.805580

which is exactly what we can get also with mvapich on the same network.

Since we do NOT  have a PCI-X hardware, I believe this is the maximum we 
can get from the

hardware.

Thanks a lot for your explanations for this tunning of OpenMPI
Best Regards,
Jean

George Bosilca wrote:


On Thu, 16 Mar 2006, Jean Latour wrote:

 


My questions are :
a)  Is OpenMPI doing in this case TCP/IP over IB ? (I guess so)
   



If the path to the mvapi library is correct then Open MPI will use mvapi 
not TCP over IB. There is a simple way to check. "ompi_info --param btl 
mvapi" will print all the parameters attached to the mvapi driver. If 
there is no mvapi in the output, then mvapi was not correctly detected. 
But I don't think it's the case, because if I remember well we have a 
protection at configure time. If you specify one of the drivers and we're 
not able to correctly use the libraries, we will stop the configure.



 

b) Is it possible to improve significantly these values by changing the 
defaults ?
   



By default we are using a very conservative approach. We never leave the 
memory pinned down, and that decrease the performance for a ping-pong. 
There are pro and cons for that, too long to be explained here, but in 
general we're seeing better performance for real-life applications with 
our default approach, and that's our main goal.


Now, if you want to get better performance for the ping-pong test please 
read the FAQ at http://www.open-mpi.org/faq/?category=infiniband.


These are the 3 flags that affect the mvapi performance for the ping-pong 
case (add them in $(HOME)/.openmpi/mca-params.conf):

btl_mvapi_flags=6
mpi_leave_pinned=1
pml_ob1_leave_pinned_pipeline=1

 

 I have used several mca btl parameters but without improving the maximum 
bandwith.

For example :  --mca btl mvapi   --mca btl_mvapi_max_send_size 8388608
   



It is difficult to improve the maximum bandwidth without the leave_pinned 
activated. But you can improve the bandwidth for medium size messages. 
Play with btl_mvapi_eager_limit to set the limit between short and 
rendez-vous protocol. "ompi_info --param btl mvapi" will give you a full 
list of parameters as well as their description.


 


c) Is it possible that other IB hardware implementations  have better
 performances with OpenMPI ?
   



The maximum bandwidth depend on several factors. One of the most 
importants is the maximum bandwidth on your node bus. To reach 800 and 
more MB/s you definitively need a PCI-X 16 ...


 

d) Is it possible to use specific IB drivers  for optimal performance  ? 
(should reach almost 800 MB/sec)
   



Once the 3 options are set, you should see an improvement on the 
bandwidth.


Let me know if it does not solve your problems.

  george.

"We must accept finite disappointment, but we must never lose infinite
hope."
  Martin Luther King

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

 



<>

Re: [OMPI users] How to establish communication between two separate COM WORLD

2006-03-27 Thread Jean Latour

Hello,

It seems to me there is only one way to create a communication between
two MPI_COMM_WORLD :  use MPI_Open_Port with a specific
IP + port address, and then MPI_comm_connect / MPI_comm_accept.

In order to ease the port number communication, the use of MPI_publish-name
/ MPI_lookup_name is also possible with the constraint that the "publish"
must be done before the "lookup", and this involves some synchronization
between the processes anyway.

Simple examples can be found in the handbook on MPI : "Using MPI-2"
by William Gropp et al.

Best Regards,
Jean

Ali Eghlima wrote:




Hello,

I have read MPI-2 documents as well as FAQ. I am confused as the best 
way to establish communication
between two  MPI_COMM_WORLD which has been created by two mpiexec 
calls on the same node.


mpiexec -conf  config1
 This start 20 processes on 7 nodes

mpiexec -conf  config2
  This start 18 processes on 5 nodes

I do appreciate any comments or pointer to a document or example.

Thanks

Ali,

 




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



<>