On 06/05/2020 08:08, Roger Pau Monné wrote:
> On Tue, May 05, 2020 at 07:25:39PM +0100, Andrew Cooper wrote:
>> AMD Fam15h processors introduced the flush-by-asid feature, for more fine
>> grain flushing purposes.
>>
>> Flushing everything including ASID 0 (i.e. Xen context) is an an 
>> unnecesserily
>> large hammer, and never necessary in the context of guest TLBs needing
>> invalidating.
>>
>> When available, use TLB_CTRL_FLUSH_ASID in preference to TLB_CTRL_FLUSH_ALL.
>>
>> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
> Reviewed-by: Roger Pau Monné <roger....@citrix.com>
>
> I should also look into restricting the usage of FLUSH_HVM_ASID_CORE
> and instead perform more fine grained per-vCPU flushes when possible,
> since FLUSH_HVM_ASID_CORE resets the pCPU ASID generation forcing a
> new ASID to be allocated for all vCPUs running on that pCPU.
>
>> ---
>> CC: Jan Beulich <jbeul...@suse.com>
>> CC: Wei Liu <w...@xen.org>
>> CC: Roger Pau Monné <roger....@citrix.com>
>>
>> The APM currently describes tlb_control encoding 1 as "Flush entire
>> TLB (Should be used only by legacy hypervisors.)".  AMD have agreed that this
>> is misleading and should say "legacy hardware" instead.
> AFAICT using TLB_CTRL_FLUSH_ASID on hardware not supporting the
> feature has not been found to be safe? Ie: TLB_CTRL_FLUSH_ALL is a
> subset of TLB_CTRL_FLUSH_ASID from a bitmap PoV, so if those bits
> where ignored on older hardware it should be safe to unconditionally
> use it.

So as far as I can tell, TLB_CTRL_FLUSH_ASID is safe to use on older
hardware, but I was told in no uncertain terms by an AMD architect that
we can't rely on that.

Hence this patch not being s/TLB_CTRL_FLUSH_ALL/TLB_CTRL_FLUSH_ALL/ in
asid.c

>
>> This change does come with a minor observed perf improvement on Fam17h
>> hardware, of ~0.6s over ~22s for my XTF pagewalk test.  This test will spot
>> TLB flushing issues, but isn't optimal for spotting the perf increase from
>> better flushing.  There were no observed differences for Fam15h, but this
>> could simply mean that the measured code footprint was larger than the TLB on
>> this CPU.
>> ---
>>  xen/arch/x86/hvm/svm/asid.c       | 9 ++++++---
>>  xen/include/asm-x86/hvm/svm/svm.h | 1 +
>>  2 files changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/svm/asid.c b/xen/arch/x86/hvm/svm/asid.c
>> index 9be90058c7..ab06dd3f3a 100644
>> --- a/xen/arch/x86/hvm/svm/asid.c
>> +++ b/xen/arch/x86/hvm/svm/asid.c
>> @@ -18,6 +18,7 @@
>>  #include <asm/amd.h>
>>  #include <asm/hvm/nestedhvm.h>
>>  #include <asm/hvm/svm/asid.h>
>> +#include <asm/hvm/svm/svm.h>
>>  
>>  void svm_asid_init(const struct cpuinfo_x86 *c)
>>  {
>> @@ -47,15 +48,17 @@ void svm_asid_handle_vmrun(void)
>>      if ( p_asid->asid == 0 )
>>      {
>>          vmcb_set_guest_asid(vmcb, 1);
>> -        /* TODO: investigate using TLB_CTRL_FLUSH_ASID here instead. */
>> -        vmcb->tlb_control = TLB_CTRL_FLUSH_ALL;
>> +        vmcb->tlb_control =
>> +            cpu_has_svm_flushbyasid ? TLB_CTRL_FLUSH_ASID : 
>> TLB_CTRL_FLUSH_ALL;
>>          return;
>>      }
>>  
>>      if ( vmcb_get_guest_asid(vmcb) != p_asid->asid )
>>          vmcb_set_guest_asid(vmcb, p_asid->asid);
>>  
>> -    vmcb->tlb_control = need_flush ? TLB_CTRL_FLUSH_ALL : TLB_CTRL_NO_FLUSH;
>> +    vmcb->tlb_control =
>> +        !need_flush ? TLB_CTRL_NO_FLUSH :
>> +        cpu_has_svm_flushbyasid ? TLB_CTRL_FLUSH_ASID : TLB_CTRL_FLUSH_ALL;
> Since this code structure is used in two places I would consider
> locally introducing something like:
>
> #define TLB_CTRL_FLUSH_CMD (cpu_has_svm_flushbyasid ? TLB_CTRL_FLUSH_ASID \
>                                                     : TLB_CTRL_FLUSH_ALL)
>
> To abstract it away.

Right, but TLB_CTRL_FLUSH_CMD is easy to confuse as constant in the same
space as TLB_CTRL_FLUSH_*, and the logic isn't going to survive a
conversion to a finer grain flushing in exactly this form.

~Andrew

Reply via email to