On 03.05.2021 17:39, Andrew Cooper wrote:
> Make use of the new xstate_uncompressed_size() helper rather than maintaining
> the running calculation while accumulating feature components.
> 
> The rest of the CPUID data can come direct from the raw cpuid policy.  All
> per-component data forms an ABI through the behaviour of the X{SAVE,RSTOR}*
> instructions, and are constant.
> 
> Use for_each_set_bit() rather than opencoding a slightly awkward version of
> it.  Mask the attributes in ecx down based on the visible features.  This
> isn't actually necessary for any components or attributes defined at the time
> of writing (up to AMX), but is added out of an abundance of caution.
> 
> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
> ---
> CC: Jan Beulich <jbeul...@suse.com>
> CC: Roger Pau Monné <roger....@citrix.com>
> CC: Wei Liu <w...@xen.org>
> 
> Using min() in for_each_set_bit() leads to awful code generation, as it
> prohibits the optimiations for spotting that the bitmap is <= BITS_PER_LONG.
> As p->xstate is long enough already, use a BUILD_BUG_ON() instead.
> ---
>  xen/arch/x86/cpuid.c | 52 
> +++++++++++++++++-----------------------------------
>  1 file changed, 17 insertions(+), 35 deletions(-)
> 
> diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
> index 752bf244ea..c7f8388e5d 100644
> --- a/xen/arch/x86/cpuid.c
> +++ b/xen/arch/x86/cpuid.c
> @@ -154,8 +154,7 @@ static void sanitise_featureset(uint32_t *fs)
>  static void recalculate_xstate(struct cpuid_policy *p)
>  {
>      uint64_t xstates = XSTATE_FP_SSE;
> -    uint32_t xstate_size = XSTATE_AREA_MIN_SIZE;
> -    unsigned int i, Da1 = p->xstate.Da1;
> +    unsigned int i, ecx_bits = 0, Da1 = p->xstate.Da1;
>  
>      /*
>       * The Da1 leaf is the only piece of information preserved in the common
> @@ -167,61 +166,44 @@ static void recalculate_xstate(struct cpuid_policy *p)
>          return;
>  
>      if ( p->basic.avx )
> -    {
>          xstates |= X86_XCR0_YMM;
> -        xstate_size = max(xstate_size,
> -                          xstate_offsets[X86_XCR0_YMM_POS] +
> -                          xstate_sizes[X86_XCR0_YMM_POS]);
> -    }
>  
>      if ( p->feat.mpx )
> -    {
>          xstates |= X86_XCR0_BNDREGS | X86_XCR0_BNDCSR;
> -        xstate_size = max(xstate_size,
> -                          xstate_offsets[X86_XCR0_BNDCSR_POS] +
> -                          xstate_sizes[X86_XCR0_BNDCSR_POS]);
> -    }
>  
>      if ( p->feat.avx512f )
> -    {
>          xstates |= X86_XCR0_OPMASK | X86_XCR0_ZMM | X86_XCR0_HI_ZMM;
> -        xstate_size = max(xstate_size,
> -                          xstate_offsets[X86_XCR0_HI_ZMM_POS] +
> -                          xstate_sizes[X86_XCR0_HI_ZMM_POS]);
> -    }
>  
>      if ( p->feat.pku )
> -    {
>          xstates |= X86_XCR0_PKRU;
> -        xstate_size = max(xstate_size,
> -                          xstate_offsets[X86_XCR0_PKRU_POS] +
> -                          xstate_sizes[X86_XCR0_PKRU_POS]);
> -    }
>  
> -    p->xstate.max_size  =  xstate_size;
> +    /* Subleaf 0 */
> +    p->xstate.max_size =
> +        xstate_uncompressed_size(xstates & ~XSTATE_XSAVES_ONLY);
>      p->xstate.xcr0_low  =  xstates & ~XSTATE_XSAVES_ONLY;
>      p->xstate.xcr0_high = (xstates & ~XSTATE_XSAVES_ONLY) >> 32;
>  
> +    /* Subleaf 1 */
>      p->xstate.Da1 = Da1;
>      if ( p->xstate.xsaves )
>      {
> +        ecx_bits |= 3; /* Align64, XSS */

Align64 is also needed for p->xstate.xsavec afaict. I'm not really
convinced to tie one to the other either. I would rather think this
is a per-state-component attribute independent of other features.
Those state components could in turn have a dependency (like XSS
ones on XSAVES).

I'm also not happy at all to see you use a literal 3 here. We have
a struct for this, after all.

>          p->xstate.xss_low   =  xstates & XSTATE_XSAVES_ONLY;
>          p->xstate.xss_high  = (xstates & XSTATE_XSAVES_ONLY) >> 32;
>      }
> -    else
> -        xstates &= ~XSTATE_XSAVES_ONLY;
>  
> -    for ( i = 2; i < min(63ul, ARRAY_SIZE(p->xstate.comp)); ++i )
> +    /* Subleafs 2+ */
> +    xstates &= ~XSTATE_FP_SSE;
> +    BUILD_BUG_ON(ARRAY_SIZE(p->xstate.comp) < 63);
> +    for_each_set_bit ( i, &xstates, 63 )
>      {
> -        uint64_t curr_xstate = 1ul << i;
> -
> -        if ( !(xstates & curr_xstate) )
> -            continue;
> -
> -        p->xstate.comp[i].size   = xstate_sizes[i];
> -        p->xstate.comp[i].offset = xstate_offsets[i];
> -        p->xstate.comp[i].xss    = curr_xstate & XSTATE_XSAVES_ONLY;
> -        p->xstate.comp[i].align  = curr_xstate & xstate_align;
> +        /*
> +         * Pass through size (eax) and offset (ebx) directly.  Visbility of
> +         * attributes in ecx limited by visible features in Da1.
> +         */
> +        p->xstate.raw[i].a = raw_cpuid_policy.xstate.raw[i].a;
> +        p->xstate.raw[i].b = raw_cpuid_policy.xstate.raw[i].b;
> +        p->xstate.raw[i].c = raw_cpuid_policy.xstate.raw[i].c & ecx_bits;

To me, going to raw[].{a,b,c,d} looks like a backwards move, to be
honest. Both this and the literal 3 above make it harder to locate
all the places that need changing if a new bit (like xfd) is to be
added. It would be better if grep-ing for an existing field name
(say "xss") would easily turn up all involved places.

Jan

Reply via email to