On 08/04/16 22:00, Jan Beulich wrote:
>>>> On 07.04.16 at 13:57, <andrew.coop...@citrix.com> wrote:
>> The existing logic is broken for heterogeneous migration.  By always
>> advertising the host maximum xstate, a migration to a less capable host 
>> always
>> fails as Xen cannot accomodate the xcr0_accum in the migration stream.
> I don't understand this - xcr0_accum is definitely part of the
> migration stream.

It is.

The problem is that, by using info->xfeature_mask rather than
guest_feature_mask, attempts by the toolstack to level a guest down on a
more capable host fails, and the guest actually does see un-levelled
xstates (i.e. the current host maximum xstates), and enables them.

When the migration (to a less capable host) does occur, the receiving
Xen (correctly) fails the restore, as it cannot accommodate the xstates
currently in use by the guest.

This is one of the several bugs contributing to why XenServer has never
enabled xsave by default.  (The forthcoming XenServer Dundee release
will be the first version with xsave enabled by default, now that
migration bugs like this are addressed.)

>
>> v5:
>>  * Reintroduce PKRU, (again, lost due to rebasing).
>>  * Rewrite the commit message and comments to try and better explain why I am
>>    deliberatly removing host-specific information from the xstate 
>> calculation.
>>  * Reintroduce 0xFFFFFFFF masks for EAX, to avoid Coverity complaining about
>>    truncation on assignment.
> Urgh - I don't think ugliness like this can be justified by Coverity
> complaining. That would be different if the compiler complained
> (like compilers other than gcc do).

Hmm - Clang is happy with the code as was.

I would prefer to avoid the Coverity issue if at all possible.  Would a
(uint32_t) cast be more acceptable?

>
>>  static void xc_cpuid_config_xsave(xc_interface *xch,
>>                                    const struct cpuid_domain_info *info,
>>                                    const unsigned int *input, unsigned int 
>> *regs)
>>  {
>> -    if ( info->xfeature_mask == 0 )
>> +    uint64_t guest_xfeature_mask;
>> +
>> +    if ( info->xfeature_mask == 0 ||
>> +         !test_bit(X86_FEATURE_XSAVE, info->featureset) )
>>      {
>>          regs[0] = regs[1] = regs[2] = regs[3] = 0;
>>          return;
>>      }
>>  
>> +    guest_xfeature_mask = X86_XCR0_SSE | X86_XCR0_X87;
>> +
>> +    if ( test_bit(X86_FEATURE_AVX, info->featureset) )
>> +        guest_xfeature_mask |= X86_XCR0_AVX;
>> +
>> +    if ( test_bit(X86_FEATURE_MPX, info->featureset) )
>> +        guest_xfeature_mask |= X86_XCR0_BNDREG | X86_XCR0_BNDCSR;
>> +
>> +    if ( test_bit(X86_FEATURE_PKU, info->featureset) )
>> +        guest_xfeature_mask |= X86_XCR0_PKRU;
>> +
>> +    if ( test_bit(X86_FEATURE_LWP, info->featureset) )
>> +        guest_xfeature_mask |= X86_XCR0_LWP;
> I continue to be unhappy about that white listing, but well...

It is all going to move into a single location in Xen in due course, but
as I have explained above, it has to exist somewhere to allow the guest
to safely migrate in a heterogeneous environment.

>
>>      case 1: /* leaf 1 */
>>          regs[0] = info->featureset[featureword_of(X86_FEATURE_XSAVEOPT)];
>> -        regs[2] &= info->xfeature_mask;
>> -        regs[3] = 0;
>> +        regs[1] = 0;
>> +
>> +        if ( test_bit(X86_FEATURE_XSAVES, info->featureset) )
>> +        {
>> +            regs[2] = guest_xfeature_mask & X86_XSS_MASK & 0xFFFFFFFF;
>> +            regs[3] = (guest_xfeature_mask >> 32) & X86_XSS_MASK;
> This is correct only because X86_XSS_MASK == 0(or else >> and &
> need to be swapped). Yet if you assume this, the and-ing with
> 0xFFFFFFFF makes even less sense here than above.

So it is.  I will submit a bugfix, once we have got an agreement on the
masking angle.

>
>> +    case 2 ... 62: /* per-component sub-leaves */
>> +        if ( !(guest_xfeature_mask & (1ULL << input[1])) )
>>          {
>>              regs[0] = regs[1] = regs[2] = regs[3] = 0;
>>              break;
>>          }
>>          /* Don't touch EAX, EBX. Also cleanup ECX and EDX */
>> -        regs[2] = regs[3] = 0;
>> +        regs[2] &= XSTATE_XSS | XSTATE_ALIGN64;
> I'm sorry for realizing this only now, but shouldn't we hide XSTATE_XSS
> from PV guests? Or is this being taken care of elsewhere?

That is covered with a dynamic check in pv_cpuid() at the moment.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Reply via email to