On 10/02/16 15:39, Jan Beulich wrote:
>>>> On 10.02.16 at 16:03, <andrew.coop...@citrix.com> wrote:
>> On 10/02/16 12:57, Jan Beulich wrote:
>>> Also drop an unnecessary va adjustment in the code being touched.
>>>
>>> Signed-off-by: Jan Beulich <jbeul...@suse.com>
>>>
>>> --- a/xen/arch/x86/flushtlb.c
>>> +++ b/xen/arch/x86/flushtlb.c
>>> @@ -139,10 +139,12 @@ unsigned int flush_area_local(const void
>>>               c->x86_clflush_size && c->x86_cache_size && sz &&
>>>               ((sz >> 10) < c->x86_cache_size) )
>>>          {
>>> -            va = (const void *)((unsigned long)va & ~(sz - 1));
>>> +            alternative(ASM_NOP3, "sfence", X86_FEATURE_CLFLUSHOPT);
>> Why separate?  This would be better in the lower alternative(), with one
>> single nop making up the difference in length.  That way, processors
>> without CLFLUSHOPT don't suffer the 1 cycle instruction decode stall
>> from the redundant rex prefix.
> Why would we want the fence inside the loop - a single fence is
> sufficient for the entire flush.

Ah yes - of course.

>
> Also if we're worried about the REX decode, this could easily be a
> NOP instead, just that I'm not certain which one in the end is less
> decode overhead.

A redundant prefix will generally have a lower overhead than a full new
instruction.

Reviewed-by: Andrew Cooper <andrew.coop...@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Reply via email to