Steve Kargl wrote:

I recently upgraded OpenMPI from 1.2.9 to 1.3 and then 1.3.1.
One of my colleagues reported a dramatic drop in performance
with one of his applications.  My investigation shows a factor
of 10 drop in communication over the memory bus.  I've placed
a figure that iilustrates the problem at
http://troutmask.apl.washington.edu/~kargl/ompi_cmp.jpg

The legend in the figure has 'ver. 1.2.9  11 <--> 18'.  This
means communication between node 11 and node 18 over GigE ethernet in my cluster. 'ver. 1.2.9 20 <--> 20' means
communication between processes on node 20 where node 20 has
8 processors.  The image clearly shows

Not so clearly in my mind since I have trouble discriminating between the colors and the overlapping lines and so on. But I'll take your word for it that the plot illustrates the point you are reporting.

It appears that you used to have just better than 1-usec latency (which is reasonable), but then it skyrocketed just over 10x with 1.3. I did some sm work, but that first appears in 1.3.2. The huge sm latencies are, so far as I know, inconsistent with everyone else's experience with 1.3. Is there any chance you could rebuild all three versions and really confirm that the observed difference can actually be attributed to differences in the OMPI source code? And/or run with "--mca btl self,sm" to make sure that the on-node message passing is indeed using sm?

that communication over
GigE is consistent among the versions of OpenMPI.  However, some
change in going from 1.2.9 to 1.3.x is causing a drop in
communication between processes on a single node.

Things to note.  Nodes 11, 18, and 20 are essentially idle
before and after a test.  configure was run with the same set
of options except with 1.3 and 1.3.1 I needed to disable ipv6:

 ./configure --prefix=/usr/local/openmpi-1.2.9 \
  --enable-orterun-prefix-by-default --enable-static
  --disable-shared

 ./configure --prefix=/usr/local/openmpi-1.3.1 \
  --enable-orterun-prefix-by-default --enable-static
  --disable-shared --disable-ipv6

 ./configure --prefix=/usr/local/openmpi-1.3.1 \
  --enable-orterun-prefix-by-default --enable-static
  --disable-shared --disable-ipv6

The operating system is FreeBSD 8.0 where nodes 18 and 20
are quad-core, dual-cpu opteron based systems and node 11
is a dual-core, dual-cpu opteron based system.  For additional
information, I've placed the output of ompi_info at

http://troutmask.apl.washington.edu/~kargl/ompi_info-1.2.9
http://troutmask.apl.washington.edu/~kargl/ompi_info-1.3.0
http://troutmask.apl.washington.edu/~kargl/ompi_info-1.3.1

Any hints on tuning 1.3.1 would be appreciated?

Reply via email to