Re: [OMPI users] Shared Memory Performance Problem.

2011-03-30 Thread Tim Prince
On 3/30/2011 10:08 AM, Eugene Loh wrote: Michele Marena wrote: I've launched my app with mpiP both when two processes are on different node and when two processes are on the same node. The process 0 is the manager (gathers the results only), processes 1 and 2 are workers (compute). This is the

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-30 Thread Eugene Loh
Michele Marena wrote: I've launched my app with mpiP both when two processes are on different node and when two processes are on the same node. The process 0 is the manager (gathers the results only), processes 1 and 2 are  workers (compute). This is the case processes 1 and 2 a

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-30 Thread Michele Marena
Hi Jeff, I thank you for your help, I've launched my app with mpiP both when two processes are on different node and when two processes are on the same node. The process 0 is the manager (gathers the results only), processes 1 and 2 are workers (compute). This is the case processes 1 and 2 are o

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-30 Thread Jeff Squyres
How many messages are you sending, and how large are they? I.e., if your message passing is tiny, then the network transport may not be the bottleneck here. On Mar 28, 2011, at 9:41 AM, Michele Marena wrote: > I run ompi_info --param btl sm and this is the output > > MCA btl

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-28 Thread Michele Marena
I run ompi_info --param btl sm and this is the output MCA btl: parameter "btl_base_debug" (current value: "0") If btl_base_debug is 1 standard debug is output, if > 1 verbose debug is output MCA btl: parameter "btl" (current value: )

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-28 Thread Ralph Castain
The fact that this exactly matches the time you measured with shared memory is suspicious. My guess is that you aren't actually using shared memory at all. Does your "ompi_info" output show shared memory as being available? Jeff or others may be able to give you some params that would let you ch

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-28 Thread Michele Marena
What happens with 2 processes on the same node with tcp? With --mca btl self,tcp my app runs in 23s. 2011/3/28 Jeff Squyres (jsquyres) > Ah, I didn't catch before that there were more variables than just tcp vs. > shmem. > > What happens with 2 processes on the same node with tcp? > > Eg, when b

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-28 Thread Tim Prince
On 3/28/2011 3:29 AM, Michele Marena wrote: Each node have two processors (no dual-core). which seems to imply that the 2 processors share memory space and a single memory buss, and the question is not about what I originally guessed. -- Tim Prince

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-28 Thread Tim Prince
On 3/28/2011 3:44 AM, Jeff Squyres (jsquyres) wrote: Ah, I didn't catch before that there were more variables than just tcp vs. shmem. What happens with 2 processes on the same node with tcp? Eg, when both procs are on the same node, are you thrashing caches or memory? In fact, I made the gues

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-28 Thread Jeff Squyres (jsquyres)
Ah, I didn't catch before that there were more variables than just tcp vs. shmem. What happens with 2 processes on the same node with tcp? Eg, when both procs are on the same node, are you thrashing caches or memory? Sent from my phone. No type good. On Mar 28, 2011, at 6:27 AM, "Michele Mar

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-28 Thread Michele Marena
Each node have two processors (no dual-core). 2011/3/28 Michele Marena > However, I thank you Tim, Ralh and Jeff. > My sequential application runs in 24s (wall clock time). > My parallel application runs in 13s with two processes on different nodes. > With shared memory, when two processes are o

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-28 Thread Michele Marena
However, I thank you Tim, Ralh and Jeff. My sequential application runs in 24s (wall clock time). My parallel application runs in 13s with two processes on different nodes. With shared memory, when two processes are on the same node, my app runs in 23s. I'm not understand why. 2011/3/28 Jeff Squyr

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-27 Thread Jeff Squyres
If your program runs faster across 3 processes, 2 of which are local to each other, with --mca btl tcp,self compared to --mca btl tcp,sm,self, then something is very, very strange. Tim cites all kinds of things that can cause slowdowns, but it's still very, very odd that simply enabling using t

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-27 Thread Ralph Castain
On Mar 27, 2011, at 7:37 AM, Tim Prince wrote: > On 3/27/2011 2:26 AM, Michele Marena wrote: >> Hi, >> My application performs good without shared memory utilization, but with >> shared memory I get performance worst than without of it. >> Do I make a mistake? Don't I pay attention to something?

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-27 Thread Tim Prince
On 3/27/2011 2:26 AM, Michele Marena wrote: Hi, My application performs good without shared memory utilization, but with shared memory I get performance worst than without of it. Do I make a mistake? Don't I pay attention to something? I know OpenMPI uses /tmp directory to allocate shared memory

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-27 Thread Michele Marena
This is my machinefile node-1-16 slots=2 node-1-17 slots=2 node-1-18 slots=2 node-1-19 slots=2 node-1-20 slots=2 node-1-21 slots=2 node-1-22 slots=2 node-1-23 slots=2 Each cluster node has 2 processors. I launch my application with 3 processes, one on node-1-16 (manager) and two on node-1-17(worke

[OMPI users] Shared Memory Performance Problem.

2011-03-27 Thread Michele Marena
Hi, My application performs good without shared memory utilization, but with shared memory I get performance worst than without of it. Do I make a mistake? Don't I pay attention to something? I know OpenMPI uses /tmp directory to allocate shared memory and it is in the local filesystem. I thank yo