Thanks for keeping on this.... Hopefully this answers all the questions: The cluster has some blades with XRC, others without. I've tested on both with the same results. For MVAPICH, a flag is set to turn on XRC; I'm not sure how OpenMPI handles it but my build is configured --enable-openib-connectx-xrc.
OpenMPI is built on a head node with a 2-port HCA (1 active) and installed on a shared file system. The compute blades I'm using are Infinihost IIIs, 1-port HCAs. As for nRepeats in bounce, I could increase it, but if that were the problem then I'd expect MVAPICH to report sporadic results as well. I just downloaded the OSU benchmarks and tried osu_latency.... It's report ~40 microsecs for OpenMPI, and ~3 micrcosecs for MVAPICH. Still puzzled... Steve -----Original Message----- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Pavel Shamis (Pasha) Sent: Thursday, February 18, 2010 3:33 AM To: Open MPI Users Subject: Re: [OMPI users] Bad Infiniband latency with subounce Hey, I only may to add the XRC and RC have the same latency. What is the command line that you use to run this benchmark ? What is the system configuration (one hca, one active port ) ? Any addition information about system configuration, mpi command line, etc. will help to analyze your issue. Regards, Pasha (Mellanox guy :-) ) Jeff Squyres wrote: > I'll defer to the Mellanox guys to reply more in detail, but here's a few > thoughts: > > - Is MVAPICH using XRC? (I never played with XRC much; it would > surprise me if it caused instability on the order of up to 100 micros > -- I ask just to see if it is an apples-to-apples comparison) > > - The nRepeats value in this code is only 10, meaning that it only seems to > be doing 10 iterations on each size. For small sizes, this might well be not > enough to be accurate. Have you tried increasing it? Or using a different > benchmark app, such as NetPIPE, osu_latency, ...etc.? > > > > On Feb 16, 2010, at 8:49 AM, Repsher, Stephen J wrote: > > >> Well the "good" news is I can end your debate over binding here...setting >> mpi_paffinity_alone 1 did nothing. (And personally as a user, I don't care >> what the default is so long as info is readily apparent in the main >> docs...and I did see the FAQs on it). >> >> It did lead me to try another parameter though, -mca mpi_preconnect_all 1, >> which seems to reduce the measured latency reliably of subounce, but it's >> still sporadic and order ~10-100 microseconds. It leads me to think that >> OpenMPI has issues with the method of measurement, which is simply to send >> progressively larger blocked messages right after calling MPI_Init (starting >> at 0 bytes which it times as the latency). OpenMPI's lazy connections >> clearly mess with this. >> >> But still not consistently 1-2 microsecs... >> >> Steve >> >> >> -----Original Message----- >> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] >> On Behalf Of Ralph Castain >> Sent: Monday, February 15, 2010 11:21 PM >> To: Open MPI Users >> Subject: Re: [OMPI users] Bad Infiniband latency with subounce >> >> >> On Feb 15, 2010, at 8:44 PM, Terry Frankcombe wrote: >> >> >>> On Mon, 2010-02-15 at 20:18 -0700, Ralph Castain wrote: >>> >>>> Did you run it with -mca mpi_paffinity_alone 1? Given this is 1.4.1, you >>>> can set the bindings to -bind-to-socket or -bind-to-core. Either will give >>>> you improved performance. >>>> >>>> IIRC, MVAPICH defaults to -bind-to-socket. OMPI defaults to no binding. >>>> >>> Is this sensible? Won't most users want processes bound? OMPI's >>> supposed to "to the right thing" out of the box, right? >>> >> Well, that depends on how you look at it. Been the subject of a lot of >> debate within the devel community. If you bind by default and it is a shared >> node cluster, then you can really mess people up. On the other hand, if you >> don't bind by default, then people that run benchmarks without looking at >> the options can get bad numbers. Unfortunately, there is no automated way to >> tell if the cluster is configured for shared use or dedicated nodes. >> >> I honestly don't know that "most users want processes bound". One >> installation I was at set binding by default using the system mca >> param file, and got yelled at by a group of users that had threaded >> apps - and most definitely did -not- want their processes bound. >> After a while, it became clear that nothing we could do would make >> everyone happy :-/ >> >> I doubt there is a right/wrong answer - at least, we sure can't find one. So >> we don't bind by default so we "do no harm", and put out FAQs, man pages, >> mpirun option help messages, etc. that explain the situation and tell you >> when/how to bind. >> >> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > > > _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users