Re: [OMPI users] Bad Infiniband latency with subounce

Repsher, Stephen J Thu, 18 Feb 2010 08:58:54 -0500

Thanks for keeping on this.... Hopefully this answers all the questions:

The cluster has some blades with XRC, others without.  I've tested on both with 
the same results. For MVAPICH, a flag is set to turn on XRC; I'm not sure how 
OpenMPI handles it but my build is configured --enable-openib-connectx-xrc.


OpenMPI is built on a head node with a 2-port HCA (1 active) and installed on a 
shared file system.  The compute blades I'm using are Infinihost IIIs, 1-port 
HCAs.

As for nRepeats in bounce, I could increase it, but if that were the problem 
then I'd expect MVAPICH to report sporadic results as well.

I just downloaded the OSU benchmarks and tried osu_latency.... It's report ~40 
microsecs for OpenMPI, and ~3 micrcosecs for MVAPICH.  Still puzzled...

Steve


-----Original Message-----
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Pavel Shamis (Pasha)
Sent: Thursday, February 18, 2010 3:33 AM
To: Open MPI Users
Subject: Re: [OMPI users] Bad Infiniband latency with subounce

Hey,
I only may to add the XRC and RC have the same latency.
What is the command line that you use to run this benchmark ?
What is the system configuration  (one hca, one active port ) ?
Any addition information about system configuration, mpi command line, etc. 
will help to analyze your issue.

Regards,
Pasha (Mellanox guy :-) )

Jeff Squyres wrote:
> I'll defer to the Mellanox guys to reply more in detail, but here's a few 
> thoughts:
>
> - Is MVAPICH using XRC?  (I never played with XRC much; it would 
> surprise me if it caused instability on the order of up to 100 micros 
> -- I ask just to see if it is an apples-to-apples comparison)
>
> - The nRepeats value in this code is only 10, meaning that it only seems to 
> be doing 10 iterations on each size.  For small sizes, this might well be not 
> enough to be accurate.  Have you tried increasing it?  Or using a different 
> benchmark app, such as NetPIPE, osu_latency, ...etc.?
>
>
>
> On Feb 16, 2010, at 8:49 AM, Repsher, Stephen J wrote:
>
>   
>> Well the "good" news is I can end your debate over binding here...setting 
>> mpi_paffinity_alone 1 did nothing. (And personally as a user, I don't care 
>> what the default is so long as info is readily apparent in the main 
>> docs...and I did see the FAQs on it).
>>
>> It did lead me to try another parameter though, -mca mpi_preconnect_all 1, 
>> which seems to reduce the measured latency reliably of subounce, but it's 
>> still sporadic and order ~10-100 microseconds.  It leads me to think that 
>> OpenMPI has issues with the method of measurement, which is simply to send 
>> progressively larger blocked messages right after calling MPI_Init (starting 
>> at 0 bytes which it times as the latency). OpenMPI's lazy connections 
>> clearly mess with this.
>>
>> But still not consistently 1-2 microsecs...
>>
>> Steve
>>
>>
>> -----Original Message-----
>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] 
>> On Behalf Of Ralph Castain
>> Sent: Monday, February 15, 2010 11:21 PM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] Bad Infiniband latency with subounce
>>
>>
>> On Feb 15, 2010, at 8:44 PM, Terry Frankcombe wrote:
>>
>>     
>>> On Mon, 2010-02-15 at 20:18 -0700, Ralph Castain wrote:
>>>       
>>>> Did you run it with -mca mpi_paffinity_alone 1? Given this is 1.4.1, you 
>>>> can set the bindings to -bind-to-socket or -bind-to-core. Either will give 
>>>> you improved performance.
>>>>
>>>> IIRC, MVAPICH defaults to -bind-to-socket. OMPI defaults to no binding.
>>>>         
>>> Is this sensible?  Won't most users want processes bound?  OMPI's 
>>> supposed to "to the right thing" out of the box, right?
>>>       
>> Well, that depends on how you look at it. Been the subject of a lot of 
>> debate within the devel community. If you bind by default and it is a shared 
>> node cluster, then you can really mess people up. On the other hand, if you 
>> don't bind by default, then people that run benchmarks without looking at 
>> the options can get bad numbers. Unfortunately, there is no automated way to 
>> tell if the cluster is configured for shared use or dedicated nodes.
>>
>> I honestly don't know that "most users want processes bound". One 
>> installation I was at set binding by default using the system mca 
>> param file, and got yelled at by a group of users that had threaded 
>> apps - and most definitely did -not- want their processes bound. 
>> After a while, it became clear that nothing we could do would make 
>> everyone happy :-/
>>
>> I doubt there is a right/wrong answer - at least, we sure can't find one. So 
>> we don't bind by default so we "do no harm", and put out FAQs, man pages, 
>> mpirun option help messages, etc. that explain the situation and tell you 
>> when/how to bind.
>>
>>     
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>       
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>     
>
>
>   

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Bad Infiniband latency with subounce

Reply via email to