On Oct 19, 2005, at 12:04 AM, Allan Menezes wrote:

We've done linpack runs recently w/ Infiniband, which result in performance comparable to mvapich, but not w/ the tcp port. Can you try running w/ an
earlier version, specify on the command line:

-mca pml teg
Hi Tim,
I tried the same cluster (16 node x86) with the switches -mca pml teg and I get good performance of 24.52Gflops at N=22500
and Block size NB=120.
My command line now looks like :
a1> mpirun -mca pls_rsh_orted /home/allan/openmpi/bin/orted -mca pml teg -hostile aa -np 16 ./xhpl
hostfile = aa, containing the addresses of the 16 machines.
I am using a GS116 16 port netgear Gigabit ethernet switch with Gnet realtek gig ethernet cards Why, PLEASE, do these switches pml teg make such a difference? It's 2.6 times more performance in GFlops than what I was getting without them.
I tried version rc3 and not an earlier version.
Thank you very much for your assistance!

Sorry for the delay in replying to this...

The "pml teg" switch tells Open MPI to use the 2nd generation TCP implementation rather than the 3rd generation TCP. More specifically, the "PML" is the point-to-point management layer. There are 2 different components for this -- teg (2nd generation) and ob1 (3rd generation). "ob1" is the default; specifying "--mca pml teg" tells Open MPI to use the "teg" component instead of ob1.

Note, however, that teg and ob1 know nothing about TCP -- it's the 2nd order implications that make the difference here. teg and ob1 use different back-end components to talk across networks:

- teg uses PTL components (point-to-point transport layer -- 2nd gen)
- ob1 uses BTL components (byte transfer layer -- 3rd gen)

We obviously have TCP implementations for both the PTL and BTL. Considerable time was spent optimizing the TCP PTL (i.e., 2nd gen). Unfortunately, as yet, little time has been spent optimizing the TCP BTL (i.e., 3rd gen) -- it was a simple port, nothing more.

We have spent the majority of our time, so far, optimizing the Myrinet and Infiniband BTLs (therefore showing that excellent performance is achievable in the BTLs). However, I'm quite disappointed by the TCP BTL performance -- it sounds like we have a protocol mismatch that is arbitrarily slowing everything down, and something that needs to be fixed before 1.0 (it's not a problem with the BTL design, since IB and Myrinet performance is quite good -- just a problem/bug in the TCP implementation of the BTL). That much performance degradation is clearly unacceptable.

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/

Reply via email to