Re: [RFC PATCH, i386]: Use "lock orl $0, -4(%esp)" in mfence_nosse

Uros Bizjak Fri, 17 Feb 2017 09:00:02 -0800

On Fri, Feb 17, 2017 at 5:30 PM, Jakub Jelinek <ja...@redhat.com> wrote:
> On Sun, May 29, 2016 at 11:10:15PM +0200, Uros Bizjak wrote:
>> As explained in PR71245, comment #3 [1], it is better to use offset -4
>> to a %esp to implement a non-SSE memory fence instruction:
>>
>> -q-
>>
>> I guess it costs a code byte for a disp8 in the addressing mode, but
>> it avoids adding a lot of latency to a critical path involving a
>> spill/reload to (%esp), in functions where there is something at
>> (%esp).
>>
>> If it's an object larger than 4B, the lock orl could even cause a
>> store-forwarding stall when the object is reloaded.  (e.g. a double or
>> a vector).
>>
>> Ideally we could do the  lock orl  on some padding between two locals,
>> or on something in memory that wasn't going to be loaded soon, to
>> avoid touching more stack memory (which might be in the next page
>> down).  But we still want to do it on a cache line that's hot, so
>> going way up above our own stack frame isn't good either.
>
> Unfortunately this makes valgrind unhappy about that:
> https://bugzilla.redhat.com/show_bug.cgi?id=1423434
> I assume it will complain now on anything pre-SSE2 that contains the memory
> barrier in 32-bit code.
> Perhaps we should decrement and increment %esp around it or something
> similar (or push/pop)?  Of course, that would mean we need to take care
> of async unwind info.


Or, we can simply revert the patch? Not that the barrier performance
of non-SSE 32bit targets matter...

Uros.

Re: [RFC PATCH, i386]: Use "lock orl $0, -4(%esp)" in mfence_nosse

Reply via email to