Hi,

I recently upgraded OpenMPI from 1.2.9 to 1.3 and then 1.3.1.
One of my colleagues reported a dramatic drop in performance
with one of his applications.  My investigation shows a factor
of 10 drop in communication over the memory bus.  I've placed
a figure that iilustrates the problem at 

http://troutmask.apl.washington.edu/~kargl/ompi_cmp.jpg

The legend in the figure has 'ver. 1.2.9  11 <--> 18'.  This
means communication between node 11 and node 18 over GigE 
ethernet in my cluster.  'ver. 1.2.9  20 <--> 20' means
communication between processes on node 20 where node 20 has
8 processors.  The image clearly shows that communication over
GigE is consistent among the versions of OpenMPI.  However, some
change in going from 1.2.9 to 1.3.x is causing a drop in
communication between processes on a single node.

Things to note.  Nodes 11, 18, and 20 are essentially idle
before and after a test.  configure was run with the same set
of options except with 1.3 and 1.3.1 I needed to disable ipv6:

  ./configure --prefix=/usr/local/openmpi-1.2.9 \
   --enable-orterun-prefix-by-default --enable-static
   --disable-shared

  ./configure --prefix=/usr/local/openmpi-1.3.1 \
   --enable-orterun-prefix-by-default --enable-static
   --disable-shared --disable-ipv6

  ./configure --prefix=/usr/local/openmpi-1.3.1 \
   --enable-orterun-prefix-by-default --enable-static
   --disable-shared --disable-ipv6

The operating system is FreeBSD 8.0 where nodes 18 and 20
are quad-core, dual-cpu opteron based systems and node 11
is a dual-core, dual-cpu opteron based system.  For additional
information, I've placed the output of ompi_info at

http://troutmask.apl.washington.edu/~kargl/ompi_info-1.2.9
http://troutmask.apl.washington.edu/~kargl/ompi_info-1.3.0
http://troutmask.apl.washington.edu/~kargl/ompi_info-1.3.1

Any hints on tuning 1.3.1 would be appreciated?

-- 
steve
-- 
Steve

Reply via email to