ovs-atomic-i586: Faster 64-bit atomics on 32-bit builds with SSE.

Jarno Rajahalme Fri, 03 Oct 2014 17:06:53 -0700


Sent from my iPhone


> On Oct 3, 2014, at 3:05 PM, Ben Pfaff <[email protected]> wrote:
> 
>> On Thu, Oct 02, 2014 at 09:14:42AM -0700, Jarno Rajahalme wrote:
>> 
>>> On Oct 1, 2014, at 4:38 PM, Jarno Rajahalme <[email protected]> wrote:
>>> 
>>> 
>>>> On Sep 26, 2014, at 11:20 AM, Ben Pfaff <[email protected]> wrote:
>>>> 
>>>>> On Wed, Sep 24, 2014 at 11:24:00AM -0700, Jarno Rajahalme wrote:
>>>>> Aligned 64-bit memory accesses in i586 are atomic.  By using an SSE
>>>>> register we can make such memory accesses in one instruction without
>>>>> bus-locking.  Need to compile with -msse to enable this feature.
>>>>> 
>>>>> Signed-off-by: Jarno Rajahalme <[email protected]>
>>>> 
>>>> I guess that ovs-atomic-i586 must be aimed at older versions of
>>>> XenServer, which always run on 64-bit capable processors but in 32-bit
>>>> mode.  That means that we can always build with -msse for XenServer.
>>>> Should we patch xenserver/openvswitch-xen.spec to do that?
>>> 
>>> Yes, I think we should do that. Maybe you are familiar with that file 
>>> already, so?
>> 
>> 64-bit capable CPUs have sse2, so better make it -msse2.
> 
> OK, I'll work on a patch.
> 
>>>> The non-SSE code in atomic_read_8__() is very clever.  I am not sure
>>>> that I would have thought of using the existing value in EBX:ECX as
>>>> the value to write as well.  It works around the PIC issue very well,
>>>> without needing any extra code.
>>> 
>>> That cleverness I must have borrowed from somewhere else.
>>> 
>>>> I am not sure why the asm statements for reading atomic variables are
>>>> volatile.  I don't think they have any side effects.
>>> 
>>> GCC manual:
>>> 
>>> "6.42.2.1 Volatile
>>> 
>>> GCC's optimizers sometimes discard asm statements if they
>>> determine there is no need for the output variables. Also, the
>>> optimizers may move code out of loops if they believe that the
>>> code will always return the same result (i.e. none of its input
>>> values change between calls). Using the volatile qualifier
>>> disables these optimizations. asm statements that have no output
>>> operands are implicitly volatile."
>>> 
>>> 
>>> Reading an atomic variable in a loop may return a different value,
>>> even when the input operands (an address) is the same, as another
>>> thread may be writing to the same variable, so the optimizations
>>> mentioned above should be disabled. Or do you think that the fact
>>> that the pointer itself is defined as volatile is enough?
>> 
>> I added some more testing for this and removed the volatile?s from
>> atomic read asm lines.
> 
> Hmm.  I should have replied more quickly.  I had this idea that
> volatile only related to side effects, but your rationale for using
> volatile makes sense to me.  I guess that based on your testing you
> are confident that volatile is not needed after all?
> 

Yes, I updated the tests to stress the 64-bit atomic read, and if it would have 
been removed from the loop, the test would have hanged. So it seems the 
volatile pointer is enough.

  Jarno

> Thanks,
> 
> Ben.
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH 1/3] lib/ovs-atomic-i586: Faster 64-bit atomics on 32-bit builds with SSE.

Reply via email to