On 01.06.2023 16:48, Andrew Cooper wrote:
> @@ -593,15 +596,85 @@ static bool __init retpoline_calculations(void)
>          return false;
>  
>      /*
> -     * RSBA may be set by a hypervisor to indicate that we may move to a
> -     * processor which isn't retpoline-safe.
> +     * The meaning of the RSBA and RRSBA bits have evolved over time.  The
> +     * agreed upon meaning at the time of writing (May 2023) is thus:
> +     *
> +     * - RSBA (RSB Alternative) means that an RSB may fall back to an
> +     *   alternative predictor on underflow.  Skylake uarch and later all 
> have
> +     *   this property.  Broadwell too, when running microcode versions prior
> +     *   to Jan 2018.
> +     *
> +     * - All eIBRS-capable processors suffer RSBA, but eIBRS also introduces
> +     *   tagging of predictions with the mode in which they were learned.  So
> +     *   when eIBRS is active, RSBA becomes RRSBA (Restricted RSBA).
> +     *
> +     * - CPUs are not expected to enumerate both RSBA and RRSBA.
> +     *
> +     * Some parts (Broadwell) are not expected to ever enumerate this
> +     * behaviour directly.  Other parts have differing enumeration with
> +     * microcode version.  Fix up Xen's idea, so we can advertise them safely
> +     * to guests, and so toolstacks can level a VM safety for migration.
> +     *
> +     * The following states exist:
> +     *
> +     * |   | RSBA | EIBRS | RRSBA | Notes              | Action        |
> +     * |---+------+-------+-------+--------------------+---------------|
> +     * | 1 |    0 |     0 |     0 | OK (older parts)   | Maybe +RSBA   |
> +     * | 2 |    0 |     0 |     1 | Broken             | +RSBA, -RRSBA |
> +     * | 3 |    0 |     1 |     0 | OK (pre-Aug ucode) | +RRSBA        |
> +     * | 4 |    0 |     1 |     1 | OK                 |               |
> +     * | 5 |    1 |     0 |     0 | OK                 |               |
> +     * | 6 |    1 |     0 |     1 | Broken             | -RRSBA        |
> +     * | 7 |    1 |     1 |     0 | Broken             | -RSBA, +RRSBA |
> +     * | 8 |    1 |     1 |     1 | Broken             | -RSBA         |
>       *
> +     * However, we doesn't need perfect adherence to the spec.  Identify the

Nit: "don't" or "it"?

> +     * broken cases (so we stand a chance of spotting violated assumptions),
> +     * and fix up Rows 1 and 3 so Xen can use RSBA || RRSBA to identify
> +     * "alternative predictors potentially in use".

Considering that it's rows 2, 6, 7, and 8 which are broken, I find this
comment a little misleading. To me it doesn't become clear whether them
subsequently being left alone (and merely a log message issued) is
intentional.

> +     */
> +    if ( cpu_has_eibrs ? cpu_has_rsba  /* Rows 7, 8 */
> +                       : cpu_has_rrsba /* Rows 2, 6 */ )
> +        printk(XENLOG_ERR
> +               "FIRMWARE BUG: CPU %02x-%02x-%02x, ucode 0x%08x: RSBA %u, 
> EIBRS %u, RRSBA %u\n",
> +               boot_cpu_data.x86, boot_cpu_data.x86_model,
> +               boot_cpu_data.x86_mask, ucode_rev,
> +               cpu_has_rsba, cpu_has_eibrs, cpu_has_rrsba);
> +
> +    /*
>       * Processors offering Enhanced IBRS are not guarenteed to be
>       * repoline-safe.
>       */
> -    if ( cpu_has_rsba || cpu_has_eibrs )
> +    if ( cpu_has_eibrs )
> +    {
> +        /*
> +         * Prior to the August 2023 microcode, many eIBRS-capable parts did
> +         * not enumerate RRSBA.
> +         */
> +        if ( !cpu_has_rrsba )
> +            setup_force_cpu_cap(X86_FEATURE_RRSBA);
> +
> +        return false;
> +    }

No clearing of RSBA in this case? I fear we may end up with misbehavior if
our own records aren't kept consistent with our assumptions. (This then
extends to the "just a log message" above as well.)

Also the inner conditional continues to strike me as odd; could you add
half a sentence to the comment (or description) as to meaning to leave
is_forced_cpu_cap() function correctly (which in turn raises the question
whether - down the road - this is actually going to matter)?

> +    /*
> +     * RSBA is explicitly enumerated in some cases, but may also be set by a
> +     * hypervisor to indicate that we may move to a processor which isn't
> +     * retpoline-safe.
> +     */
> +    if ( cpu_has_rsba )
>          return false;
>  
> +    /*
> +     * At this point, we've filtered all the legal RSBA || RRSBA cases (or 
> the
> +     * known non-ideal cases).  If ARCH_CAPS is visible, trust the absence of
> +     * RSBA || RRSBA.  There's no known microcode which advertises ARCH_CAPS
> +     * without RSBA or EIBRS, and if we're virtualised we can't rely the 
> model
> +     * check anyway.
> +     */

I think "no known" wants further qualification: When IBRSB was first
introduced, EIBRS and RSBA weren't even known about. I also didn't
think all hardware (i.e. sufficiently old one) did get newer ucode
when these started to be known. Possibly you mean "... which wrongly
advertises ..."?

> @@ -689,6 +762,15 @@ static bool __init retpoline_calculations(void)
>          break;
>      }
>  
> +    if ( !safe )
> +    {
> +        /*
> +         * Note: the eIBRS-capable parts are filtered out earlier, so the
> +         * remainder here are the ones which suffer only RSBA behaviour.
> +         */

As I think I had mentioned already, I find "only" here odd, when RSBA
is more severe than RRSBA. Maybe the "only" could move earlier, e.g.
between "are" and "the"? Then again this may be another non-native-
speaker issue of mine ...

Jan

Reply via email to