On 15.06.2023 12:41, Andrew Cooper wrote:
> On 15/06/2023 9:30 am, Jan Beulich wrote:
>> On 14.06.2023 20:12, Andrew Cooper wrote:
>>> On 13/06/2023 10:59 am, Jan Beulich wrote:
>>>> On 12.06.2023 18:13, Andrew Cooper wrote:
>>>>> The RSBA bit, "RSB Alternative", means that the RSB may use alternative
>>>>> predictors when empty.  From a practical point of view, this mean 
>>>>> "Retpoline
>>>>> not safe".
>>>>>
>>>>> Enhanced IBRS (officially IBRS_ALL in Intel's docs, previously IBRS_ATT) 
>>>>> is a
>>>>> statement that IBRS is implemented in hardware (as opposed to the form
>>>>> retrofitted to existing CPUs in microcode).
>>>>>
>>>>> The RRSBA bit, "Restricted-RSBA", is a combination of RSBA, and the eIBRS
>>>>> property that predictions are tagged with the mode in which they were 
>>>>> learnt.
>>>>> Therefore, it means "when eIBRS is active, the RSB may fall back to
>>>>> alternative predictors but restricted to the current prediction mode".  As
>>>>> such, it's stronger statement than RSBA, but still means "Retpoline not 
>>>>> safe".
>>>>>
>>>>> CPUs are not expected to enumerate both RSBA and RRSBA.
>>>>>
>>>>> Add feature dependencies for EIBRS and RRSBA.  While technically they're 
>>>>> not
>>>>> linked, absolutely nothing good can come of letting the guest see RRSBA
>>>>> without EIBRS.  Nor a guest seeing EIBRS without IBRSB.  Furthermore, we 
>>>>> use
>>>>> this dependency to simplify the max derivation logic.
>>>>>
>>>>> The max policies gets RSBA and RRSBA unconditionally set (with the EIBRS
>>>>> dependency maybe hiding RRSBA).  We can run any VM, even if it has been 
>>>>> told
>>>>> "somewhere you might run, Retpoline isn't safe".
>>>>>
>>>>> The default policies are more complicated.  A guest shouldn't see both 
>>>>> bits,
>>>>> but it needs to see one if the current host suffers from any form of 
>>>>> RSBA, and
>>>>> which bit it needs to see depends on whether eIBRS is visible or not.
>>>>> Therefore, the calculation must be performed after sanitise_featureset().
>>>>>
>>>>> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
>>>>> ---
>>>>> CC: Jan Beulich <jbeul...@suse.com>
>>>>> CC: Roger Pau Monné <roger....@citrix.com>
>>>>> CC: Wei Liu <w...@xen.org>
>>>>>
>>>>> v3:
>>>>>  * Minor commit message adjustment.
>>>>>  * Drop changes to recalculate_cpuid_policy().  Deferred to a later 
>>>>> series.
>>>> With this dropped, with the title not saying "max/default", and with
>>>> the description also not mentioning "live" policies at all, I don't
>>>> think this patch is self-consistent (meaning in particular: leaving
>>>> aside the fact that there's no way right now to requests e.g. both
>>>> RSBA and RRSBA for a guest; aiui it is possible for Dom0).
>>>>
>>>> As you may imagine I'm also curious why you decided to drop this.
>>> Because when I tried doing levelling in Xapi, I remembered why I did it
>>> the way I did in v1, and why the v2 way was wrong.
>>>
>>> Xen cannot safely edit what the toolstack provides, so must not. 
>> And this is the part I don't understand: Why can't we correct the
>> (EIBRS,RSBA,RRSBA) tuple to a combination that is "legal"? At least
>> as long as ...
>>
>>> Instead, failing the set_policy() call is an option, and is what we want
>>> to do longterm,
>> ... we aren't there.
>>
>>> but also happens to be wrong too in this case. An admin
>>> may know that a VM isn't using retpoline, and may need to migrate it
>>> anyway for a number of reasons, so any safety checks need to be in the
>>> toolstack, and need to be overrideable with something like --force.
>> Possibly leading to an inconsistent policy exposed to a guest? I
>> guess this may be the only option when we can't really resolve an
>> ambiguity, but that isn't the case here, is it?
> 
> Wrong.  Xen does not have any knowledge of other hosts the VM might
> migrate to.
> 
> So while Xen can spot problem combinations *on this host*, which way to
> correct the problem combination depends on where the VM might migrate to.

I actually view this as two different levels: With a flawed policy, the
guest is liable to not work correctly at all. No point thinking about
it being able to migrate. With a fixed up policy it may fail to migrate,
but it'll at least work otherwise.

Jan

Reply via email to