Hi, I recently upgraded OpenMPI from 1.2.9 to 1.3 and then 1.3.1. One of my colleagues reported a dramatic drop in performance with one of his applications. My investigation shows a factor of 10 drop in communication over the memory bus. I've placed a figure that iilustrates the problem at
http://troutmask.apl.washington.edu/~kargl/ompi_cmp.jpg The legend in the figure has 'ver. 1.2.9 11 <--> 18'. This means communication between node 11 and node 18 over GigE ethernet in my cluster. 'ver. 1.2.9 20 <--> 20' means communication between processes on node 20 where node 20 has 8 processors. The image clearly shows that communication over GigE is consistent among the versions of OpenMPI. However, some change in going from 1.2.9 to 1.3.x is causing a drop in communication between processes on a single node. Things to note. Nodes 11, 18, and 20 are essentially idle before and after a test. configure was run with the same set of options except with 1.3 and 1.3.1 I needed to disable ipv6: ./configure --prefix=/usr/local/openmpi-1.2.9 \ --enable-orterun-prefix-by-default --enable-static --disable-shared ./configure --prefix=/usr/local/openmpi-1.3.1 \ --enable-orterun-prefix-by-default --enable-static --disable-shared --disable-ipv6 ./configure --prefix=/usr/local/openmpi-1.3.1 \ --enable-orterun-prefix-by-default --enable-static --disable-shared --disable-ipv6 The operating system is FreeBSD 8.0 where nodes 18 and 20 are quad-core, dual-cpu opteron based systems and node 11 is a dual-core, dual-cpu opteron based system. For additional information, I've placed the output of ompi_info at http://troutmask.apl.washington.edu/~kargl/ompi_info-1.2.9 http://troutmask.apl.washington.edu/~kargl/ompi_info-1.3.0 http://troutmask.apl.washington.edu/~kargl/ompi_info-1.3.1 Any hints on tuning 1.3.1 would be appreciated? -- steve -- Steve