[OMPI users] How does OpenMPI decided to use which algorithm in MPI_Bcast????????????????

2009-09-03 Thread shan axida
Hi, 
I had a glance at OpenMPI source codes and there are several algorithms for 
MPI_Bcast function.
My question is how is the algorithm decided to use in a given MPI_Bcast call? 
message size? 
Anyone give me little detailed information for this question?

Thanks a lot.

Axida





[OMPI users] SHARED Memory----------------

2009-04-22 Thread shan axida
Hi,

Any body know how to make use of shared memory in OpenMPI implementation?

Thanks





Re: [OMPI users] SHARED Memory----------------

2009-04-23 Thread shan axida
Hi,
What I am asking is if I use MPI_Send and MPI_Recv between processes in 
a node, does it mean using shared memory or not? if not, how to use 
shared memory among processes which are runing in a node?


Thank you!





From: Eugene Loh 
To: Open MPI Users 
Sent: Thursday, April 23, 2009 1:20:05 PM
Subject: Re: [OMPI users] SHARED Memory

Just to clarify (since "send to self" strikes me as confusing)...

If you're talking about using shared memory for point-to-point MPI
message passing, OMPI typically uses it automatically between two
processes on the same node.  It is *not* used for a process sending to
itself.  There is a well-written FAQ (in my arrogant opinion!) at
http://www.open-mpi.org/faq/?category=sm -- e.g.,
http://www.open-mpi.org/faq/?category=sm#sm-btl .

If you're talking about some other use of shared memory, let us know
what you had in mind.

Elvedin Trnjanin wrote: 
Shared memory is used for send-to-self scenarios such as if you're
making use of multiple slots on the same machine.

shan axida wrote: 
Any body know how to make use of shared memory in OpenMPI
implementation?


  

Re: [OMPI users] SHARED Memory----------------

2009-04-23 Thread shan axida
Hi,
It have read that FAQ. 
Does it mean shared memory communication is used when send messages 
between the processes in same node in default?
No need any options and configuration for OpenMPI shared memory?

THANK YOU!





From: Eugene Loh 
To: Open MPI Users 
Sent: Thursday, April 23, 2009 2:08:33 PM
Subject: Re: [OMPI users] SHARED Memory

shan axida wrote: 
What
I am asking is if I use MPI_Send and MPI_Recv between processes in  
a node, does it mean using shared memory or not?
It (typically) does.  (Some edge cases could occur.)  Your question is
addressed by the FAQ I mentioned.

if not, how to use 
shared memory among processes which are runing in a node?


From: Eugene Loh 
To: Open MPI Users 
Sent: Thursday, April 23, 2009 1:20:05 PM
Subject: Re: [OMPI users] SHARED Memory

Just to clarify (since "send to self" strikes me as confusing)...

If you're talking about using shared memory for point-to-point MPI
message passing, OMPI typically uses it automatically between two
processes on the same node.  It is *not* used for a process sending to
itself.  There is a well-written FAQ (in my arrogant opinion!) at
http://www.open-mpi.org/faq/?category=sm -- e.g.,
http://www.open-mpi.org/faq/?category=sm#sm-btl .

If you're talking about some other use of shared memory, let us know
what you had in mind.

Elvedin Trnjanin wrote: 
Shared memory is used for send-to-self scenarios such as if you're
making use of multiple slots on the same machine.

shan axida wrote: 
Any body know how to make use of shared memory in OpenMPI
implementation?



  

[OMPI users] MPI_Bcast from OpenMPI

2009-04-23 Thread shan axida
Hi, 
One more question:
I have executed the MPI_Bcast() in 64 processes in 16 nodes Ethernet multiple 
links cluster.
The result is shown in the file attached on this E-mail.
What is going on at 131072 double message size?
I have executed it many times but the result is still the same.

THANK YOU!


  

openmpi.pdf
Description: Adobe PDF document


[OMPI users] Fw: MPI_Bcast from OpenMPI

2009-04-23 Thread shan axida




- Forwarded Message 
From: shan axida 
To: Open MPI Users 
Sent: Thursday, April 23, 2009 2:32:08 PM
Subject: MPI_Bcast from OpenMPI


Hi, 
One more question:
I have executed the MPI_Bcast() in 64 processes in 16 nodes Ethernet multiple 
links cluster.
The result is shown in the file attached on this E-mail.
What is going on at 131072 double message size?
I have executed it many times but the result is still the same.

THANK YOU!


  

openmpi.pdf
Description: Adobe PDF document


Re: [OMPI users] MPI_Bcast from OpenMPI

2009-04-23 Thread shan axida
Hi,
Hardware setups:
+ We have 4 NICs for each node in our cluster. That is why I called 4 links.
+ All nodes are connected by 4 switches (1Gb switch).
+ 4GB memory for each node.
How can I check that NUMA or UMA memory access?

Thank you!





From: Jeff Squyres 
To: Open MPI Users 
Sent: Thursday, April 23, 2009 8:23:52 PM
Subject: Re: [OMPI users] MPI_Bcast from OpenMPI

Very strange; 6 seconds for a 1MB broadcast over 64 processes is *way* too 
long.  Even 2.5 sec at 2MB seems too long -- what is your network speed?  I'm 
not entirely sure what you mean by "4 link" on your graph.

Without more information, I would first check your hardware setup to see if 
there's some kind of network buffering / congestion issue occurring.  Here's a 
total guess: your ethernet switch(es) are low quality (from an HPC perspective, 
at least) such that you're incurring congestion and/or retransmission at that 
size for some reason.

You could also be running up against memory bus congestion (I assume you mean 4 
cores per node; are they NUMA or UMA?).  But that wouldn't account for the huge 
spike at 1MB.


On Apr 23, 2009, at 1:32 AM, shan axida wrote:

> Hi,
> One more question:
> I have executed the MPI_Bcast() in 64 processes in 16 nodes Ethernet multiple 
> links cluster.
> The result is shown in the file attached on this E-mail.
> What is going on at 131072 double message size?
> I have executed it many times but the result is still the same.
> 
> THANK YOU!
> 
> 
> 
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



  

Re: [OMPI users] MPI_Bcast from OpenMPI

2009-04-23 Thread shan axida
Sorry, I had a mistake in calculation.
Not 131072 (double) but 131072 KB.
It means around 128 MB.
 




From: Jeff Squyres 
To: Open MPI Users 
Sent: Thursday, April 23, 2009 8:23:52 PM
Subject: Re: [OMPI users] MPI_Bcast from OpenMPI

Very strange; 6 seconds for a 1MB broadcast over 64 processes is *way* too 
long.  Even 2.5 sec at 2MB seems too long -- what is your network speed?  I'm 
not entirely sure what you mean by "4 link" on your graph.

Without more information, I would first check your hardware setup to see if 
there's some kind of network buffering / congestion issue occurring.  Here's a 
total guess: your ethernet switch(es) are low quality (from an HPC perspective, 
at least) such that you're incurring congestion and/or retransmission at that 
size for some reason.

You could also be running up against memory bus congestion (I assume you mean 4 
cores per node; are they NUMA or UMA?).  But that wouldn't account for the huge 
spike at 1MB.


On Apr 23, 2009, at 1:32 AM, shan axida wrote:

> Hi,
> One more question:
> I have executed the MPI_Bcast() in 64 processes in 16 nodes Ethernet multiple 
> links cluster.
> The result is shown in the file attached on this E-mail.
> What is going on at 131072 double message size?
> I have executed it many times but the result is still the same.
> 
> THANK YOU!
> 
> 
> 
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



  

Re: [OMPI users] MPI_Bcast from OpenMPI

2009-04-23 Thread shan axida
But, exactly the same program gets different result in another cluster.
I mean the result doent have any spike at all.
Second cluster is almost the same features with the previous one except little 
small memory capacity and little low frequency.
First cluster: 3.0 GHz Intel Xeon, 4GB memory, centOS 4.6, 
Second cluster: 2.8 GHz Intel Xeon, 3GBmemory, Fedora core5
Openmpi1.3 is used in both cluster.







From: Eugene Loh 
To: Open MPI Users 
Sent: Friday, April 24, 2009 1:26:14 AM
Subject: Re: [OMPI users] MPI_Bcast from OpenMPI

Okay.  So, going back to Jeff's second surprise, we have 256 Mbyte/2.5
sec = 100 Mbyte/sec = 1 Gbit/sec (sloppy math).  So, without getting
into details of what we're measuring/reporting here, there doesn't on
the face of it appear to be anything wrong with the baseline
performance.  Jeff was right that 256K doubles should have been faster,
but 256 Mbyte... seems reasonable.

So, the remaining mystery is the 6x or so spike at 128 Mbyte.  Dunno. 
How important is it to resolve that mystery?

shan axida wrote: 
Sorry, I had a mistake in calculation.
Not 131072 (double) but 131072 KB.
It means around 128 MB.
 
From: Jeff Squyres 
To: Open MPI Users 
Sent: Thursday, April 23, 2009 8:23:52 PM
Subject: Re: [OMPI users] MPI_Bcast from OpenMPI


Very strange; 6 seconds for a 1MB broadcast over 64 processes is *way*
too long.  Even 2.5 sec at 2MB seems too long


  

Re: [OMPI users] MPI_Bcast from OpenMPI

2009-04-24 Thread shan axida
Thank You Eugene Loh,
It is very important for me to explain the spike at figure!
But I dont know how to hunt the reason and how to check it.
Would you please help me in more practically?


Thank you again.






From: Eugene Loh 
To: Open MPI Users 
Sent: Friday, April 24, 2009 2:16:22 PM
Subject: Re: [OMPI users] MPI_Bcast from OpenMPI

Right.  So, baseline performance seems reasonable, but there is an odd
spike that seems difficult to explain.  This is annoying, but again: 
how important is it to resolve that mystery?  You can spend a few days
trying to hunt this down, only to find that it's some oddity that has
no general relevence.  I don't know if that's really the case, but I'm
just suggesting that it may make most sense just to let this one go.

shan axida wrote: 
But, exactly the same program gets different result in another
cluster.
I mean the result doent have any spike at all.
Second cluster is almost the same features with the previous one

From: Eugene Loh 
To: Open MPI Users 
Sent: Friday, April 24, 2009 1:26:14 AM
Subject: Re: [OMPI users] MPI_Bcast from OpenMPI


So, the remaining mystery is the 6x or so spike at 128 Mbyte.  Dunno. 
How important is it to resolve that mystery?


  

[OMPI users] OpenMPI MPI_Bcast Algorithms

2009-04-28 Thread shan axida
Hi all,
I think there are several algorithms used in MPI_Bcast.
I am wondering how are they decided to be excuted ?
I mean, How to decide which algorithm will be used? 
Is it depending on the message size or something ?
Would some people help me?

Thank you!



  

[OMPI users] ****---How to configure NIS and MPI on spread NICs?----****

2009-05-12 Thread shan axida
Hello all,
I want to configure NIS and MPI with different network.
For example, NIS uses eth0 and MPI uses eth1 some thing like that.
How can I do that?


Axida




[OMPI users] How to use Multiple links with OpenMPI? ?????????????????

2009-05-26 Thread shan axida
Hi everyone,
I want to ask how to use multiple links (multiple NICs) with OpenMPI.
For example, how can I assign a link to each process, if there are 4 links 
and 4 processors on each node in our cluster?
Is this a correct way?
hostfile:
--
host1-eth0 slots=1
host1-eth1 slots=1
host1-eth2 slots=1
host1-eth3 slots=1
host2-eth0 slots=1
host2-eth1 slots=1
host2-eth2 slots=1
host2-eth3 slots=1
... ...
...  ...
host16-eth0 slots=1
host16-eth1 slots=1
host16-eth2 slots=1
host16-eth3 slots=1



  

Re: [OMPI users] How to use Multiple links with OpenMPI??????????????????

2009-05-28 Thread shan axida
Thank you! Mr. Jeff Squyres,
I have conducted a simple MPI_Bcast experiment in out cluster.
The results are shown in the file attached on this e-mail.
The hostfile is :
-
hostname1 slots=4
hostname2 slots=4
hostname3 slots=4


hostname16 slots=4
-
As we can see in the figure, it is little faster than single link
when we use 2,3,4 links between nodes.
My question is what would be the reason to make almost the same 
performance when we use 2,3,4 links ?

Thank you!

Axida







From: Jeff Squyres 
To: Open MPI Users 
Sent: Wednesday, May 27, 2009 11:28:42 PM
Subject: Re: [OMPI users] How to use Multiple links with 
OpenMPI??

Open MPI considers hosts differently than network links.

So you should only list the actual hostname in the hostfile, with slots equal 
to the number of processors (4 in your case, I think?).

Once the MPI processes are launched, they each look around on the host that 
they're running and find network paths to each of their peers.  If they are 
multiple paths between pairs of peers, Open MPI will round-robin stripe 
messages across each of the links.  We don't really have an easy setting for 
each peer pair only using 1 link.  Indeed, since connectivity is bidirectional, 
the traffic patterns become less obvious if you want MPI_COMM_WORLD rank X to 
only use link Y -- what does that mean to the other 4 MPI processes on the 
other host (with whom you have assumedly assigned their own individual links as 
well)?


On May 26, 2009, at 12:24 AM, shan axida wrote:

> Hi everyone,
> I want to ask how to use multiple links (multiple NICs) with OpenMPI.
> For example, how can I assign a link to each process, if there are 4 links
> and 4 processors on each node in our cluster?
> Is this a correct way?
> hostfile:
> --
> host1-eth0 slots=1
> host1-eth1 slots=1
> host1-eth2 slots=1
> host1-eth3 slots=1
> host2-eth0 slots=1
> host2-eth1 slots=1
> host2-eth2 slots=1
> host2-eth3 slots=1
> ... ...
> ...  ...
> host16-eth0 slots=1
> host16-eth1 slots=1
> host16-eth2 slots=1
> host16-eth3 slots=1
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



  

MPI_Bcast-ypc05xx.pdf
Description: Adobe PDF document


Re: [OMPI users] How to use Multiple links with OpenMPI??????????????????

2009-05-29 Thread shan axida
Hi Mr. Jeff Squyres,
Is it true to use bidirectianal communication with MPI in ethernet Cluster?
I have tried once (I thought, it is possible because of fully duplex swithes).
 However, I could not get bandwidth improvement as I was expecting.

If you answer is YES, would you please tell me about pseudocode for 
bidirectional communication ? 

Thank you.
Axida 






From: Jeff Squyres 
To: Open MPI Users 
Sent: Wednesday, May 27, 2009 11:28:42 PM
Subject: Re: [OMPI users] How to use Multiple links with 
OpenMPI??

Open MPI considers hosts differently than network links.

So you should only list the actual hostname in the hostfile, with slots equal 
to the number of processors (4 in your case, I think?).

Once the MPI processes are launched, they each look around on the host that 
they're running and find network paths to each of their peers.  If they are 
multiple paths between pairs of peers, Open MPI will round-robin stripe 
messages across each of the links.  We don't really have an easy setting for 
each peer pair only using 1 link.  Indeed, since connectivity is bidirectional, 
the traffic patterns become less obvious if you want MPI_COMM_WORLD rank X to 
only use link Y -- what does that mean to the other 4 MPI processes on the 
other host (with whom you have assumedly assigned their own individual links as 
well)?


On May 26, 2009, at 12:24 AM, shan axida wrote:

> Hi everyone,
> I want to ask how to use multiple links (multiple NICs) with OpenMPI.
> For example, how can I assign a link to each process, if there are 4 links
> and 4 processors on each node in our cluster?
> Is this a correct way?
> hostfile:
> --
> host1-eth0 slots=1
> host1-eth1 slots=1
> host1-eth2 slots=1
> host1-eth3 slots=1
> host2-eth0 slots=1
> host2-eth1 slots=1
> host2-eth2 slots=1
> host2-eth3 slots=1
> ... ...
> ...  ...
> host16-eth0 slots=1
> host16-eth1 slots=1
> host16-eth2 slots=1
> host16-eth3 slots=1
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



  

Re: [OMPI users] How to use Multiple links withOpenMPI??????????????????

2009-06-04 Thread shan axida
Hi Jeff Squyres,
We have Dell powerconnect 2724 Gigabit switches to connect the nodes in our 
cluster.
As you said, may be the speed of PCI bus is a bottleneck.
How can check it in practical? 
What is your suggestion for the problem?

Thank you!
Axida






From: Jeff Squyres 
To: Open MPI Users 
Sent: Tuesday, June 2, 2009 10:15:39 AM
Subject: Re: [OMPI users] How to use Multiple links 
withOpenMPI??

Note that striping doesn't really help you much until data sizes get large.  
For example, networks tend to have an elbow in the graph where the size of the 
message starts to matter (clearly evident on your graphs).

Additionally, you have your network marked as with "hubs" not "switches" -- if 
you really do have hubs and not switches, you may run into serious contention 
issues if you start loading up the network.

With both of these factors, even though you have 4 links, you likely aren't 
going to see much of a performance benefit until you send large messages (which 
will be limited by your bus speeds -- can you feed all 4 of your links from a 
single machine at line rate, or will you be limited by PCI bus speeds and 
contention?), and you may run into secondary performance issues due to 
contention on your hubs.


On May 28, 2009, at 11:06 PM, shan axida wrote:

> Thank you! Mr. Jeff Squyres,
> I have conducted a simple MPI_Bcast experiment in out cluster.
> The results are shown in the file attached on this e-mail.
> The hostfile is :
> -
> hostname1 slots=4
> hostname2 slots=4
> hostname3 slots=4
> 
> 
> hostname16 slots=4
> -
> As we can see in the figure, it is little faster than single link
> when we use 2,3,4 links between nodes.
> My question is what would be the reason to make almost the same
> performance when we use 2,3,4 links ?
> 
> Thank you!
> 
> Axida
> 
> 
> 
> 
> From: Jeff Squyres 
> To: Open MPI Users 
> Sent: Wednesday, May 27, 2009 11:28:42 PM
> Subject: Re: [OMPI users] How to use Multiple links with 
> OpenMPI??
> 
> Open MPI considers hosts differently than network links.
> 
> So you should only list the actual hostname in the hostfile, with slots equal 
> to the number of processors (4 in your case, I think?).
> 
> Once the MPI processes are launched, they each look around on the host that 
> they're running and find network paths to each of their peers.  If they are 
> multiple paths between pairs of peers, Open MPI will round-robin stripe 
> messages across each of the links.  We don't really have an easy setting for 
> each peer pair only using 1 link.  Indeed, since connectivity is 
> bidirectional, the traffic patterns become less obvious if you want 
> MPI_COMM_WORLD rank X to only use link Y -- what does that mean to the other 
> 4 MPI processes on the other host (with whom you have assumedly assigned 
> their own individual links as well)?
> 
> 
> On May 26, 2009, at 12:24 AM, shan axida wrote:
> 
> > Hi everyone,
> > I want to ask how to use multiple links (multiple NICs) with OpenMPI.
> > For example, how can I assign a link to each process, if there are 4 links
> > and 4 processors on each node in our cluster?
> > Is this a correct way?
> > hostfile:
> > --
> > host1-eth0 slots=1
> > host1-eth1 slots=1
> > host1-eth2 slots=1
> > host1-eth3 slots=1
> > host2-eth0 slots=1
> > host2-eth1 slots=1
> > host2-eth2 slots=1
> > host2-eth3 slots=1
> > ......
> > ...  ...
> > host16-eth0 slots=1
> > host16-eth1 slots=1
> > host16-eth2 slots=1
> > host16-eth3 slots=1
> > 
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> --Jeff Squyres
> Cisco Systems
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] How to use Multiple linkswithOpenMPI??????????????????

2009-06-08 Thread shan axida
Hi,
Yes, we 2 NICs on the same bus and the other 2 are embeded.
We did the experiment about netperf in our cluster and we 
could not get full bandwith using 4 pairs copies on two nodes.
the bandwidth is increased when the number of NICs changes to 2
but there is no big increase when it becomes 3, 4.

Thank you!
Axida.






From: Jeff Squyres 
To: Open MPI Users 
Sent: Friday, June 5, 2009 11:19:02 PM
Subject: Re: [OMPI users] How to use Multiple linkswithOpenMPI??

On Jun 4, 2009, at 3:42 AM, shan axida wrote:

> We have Dell powerconnect 2724 Gigabit switches to connect the nodes in our 
> cluster.
> As you said, may be the speed of PCI bus is a bottleneck.
> How can check it in practical?

Are all your gige nics on the same bus?

You might want to try running multiple copies of TCP pt2pt benchmarks 
simultaneously on your machine to see what kind of performance you get.  E.g., 
run 4 copies of netperf on node A talking to 4 corresponding copies of netper 
on node B.  Do you get full bandwidth out of all 4 copies?

--Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] "Re: Best way to overlap computation and transfer using MPI over TCP/Ethernet?"

2009-06-08 Thread shan axida
Hi, 
Would you please tell me how did you do the experiment by calling MPI_Test in
little more details? 

Thanks!






From: Lars Andersson 
To: us...@open-mpi.org
Sent: Tuesday, June 9, 2009 6:11:11 AM
Subject: Re: [OMPI users] "Re: Best way to overlap computation and transfer 
using MPI over TCP/Ethernet?"

On Mon, Jun 8, 2009 at 11:07 PM, Lars Andersson wrote:
> I'd say that your own workaround here is to intersperse MPI_TEST's
> periodically. This will trigger OMPI's pipelined protocol for large
> messages, and should allow partial bursts of progress while you're
> assumedly off doing useful work. If this is difficult because the
> work is being done in library code that you can't change, then perhaps
> a pre-spawned "work" through could be used to call MPI_TEST
> periodically. That way, it won't steal huge ammounts of CPU cycles
> (like MPI_WAIT would). You still might get some cache thrashing,
> context switching, etc. -- YMMV.

Thanks Jeff, it's good to hear that this is a valid workaround. I've
done a few small experiments, and by calling MPI_Test in a while loop
with an usleep(1000) I'm able to get almost full bandwidth for large
messages with less than 5% CPU utilization.

/Lars
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users