On Mon, Apr 06, 2009 at 02:04:16PM -0700, Eugene Loh wrote:
> Steve Kargl wrote:
> 
> >I recently upgraded OpenMPI from 1.2.9 to 1.3 and then 1.3.1.
> >One of my colleagues reported a dramatic drop in performance
> >with one of his applications.  My investigation shows a factor
> >of 10 drop in communication over the memory bus.  I've placed
> >a figure that iilustrates the problem at 
> >
> >http://troutmask.apl.washington.edu/~kargl/ompi_cmp.jpg
> >
> >The legend in the figure has 'ver. 1.2.9  11 <--> 18'.  This
> >means communication between node 11 and node 18 over GigE 
> >ethernet in my cluster.  'ver. 1.2.9  20 <--> 20' means
> >communication between processes on node 20 where node 20 has
> >8 processors.  The image clearly shows
> >
> Not so clearly in my mind since I have trouble discriminating between 
> the colors and the overlapping lines and so on.  But I'll take your word 
> for it that the plot illustrates the point you are reporting.

OK.  I've removed the GigE results in the graph and plotted with
points as well as lines.  You'll see a red line by itself.  The
green and blue lines overlap.  The original data is now 

http://troutmask.apl.washington.edu/~kargl/ompi_cmp_new.jpg

> It appears that you used to have just better than 1-usec latency (which 
> is reasonable), but then it skyrocketed just over 10x with 1.3.  I did 
> some sm work, but that first appears in 1.3.2.

According to netpipe, I have

version 1.3.1
0: node20.cimu.org
1: node20.cimu.org
Latency: 0.000009131
Sync Time: 0.000018241
Now starting main loop

version 1.2.9
0: node20.cimu.org
1: node20.cimu.org
Latency: 0.000000669
Sync Time: 0.000001811

So, the latency has indeed gone up.

> The huge sm latencies are, so far as I know, inconsistent with
> everyone else's experience with 1.3.  Is there any chance you
> could rebuild all three versions and really confirm that the
> observed difference can actually be attributed to differences
> in the OMPI source code?  And/or run with "--mca btl 
> self,sm" to make sure that the on-node message passing is indeed using sm?
> 

The command lines I used are

/usr/local/openmpi-1.2.9/bin/mpicc -o z -O -static GetOpt.c netmpi.c
/usr/local/openmpi-1.2.9/bin/mpiexec -machinefile mf_ompi_2 -n 2 ./z

/usr/local/openmpi-1.3.1/bin/mpicc -o z -O -static GetOpt.c netmpi.c
/usr/local/openmpi-1.3.1/bin/mpiexec --mca btl self,sm -machinefile \
   mf_ompi_2 -n 2 ./z

There is no change in the results as can be seen at

http://troutmask.apl.washington.edu/~kargl/ompi_cmp_self.sm.jpg

The machinefile contains the single line 'node20.cimu.org slots=2'.


I can rebuild 1.2.9 and 1.3.1.  Is there any particular configure
options that I should enable/disable?

-- 
Steve

Reply via email to