On 01.06.2023 16:48, Andrew Cooper wrote: > @@ -593,15 +596,85 @@ static bool __init retpoline_calculations(void) > return false; > > /* > - * RSBA may be set by a hypervisor to indicate that we may move to a > - * processor which isn't retpoline-safe. > + * The meaning of the RSBA and RRSBA bits have evolved over time. The > + * agreed upon meaning at the time of writing (May 2023) is thus: > + * > + * - RSBA (RSB Alternative) means that an RSB may fall back to an > + * alternative predictor on underflow. Skylake uarch and later all > have > + * this property. Broadwell too, when running microcode versions prior > + * to Jan 2018. > + * > + * - All eIBRS-capable processors suffer RSBA, but eIBRS also introduces > + * tagging of predictions with the mode in which they were learned. So > + * when eIBRS is active, RSBA becomes RRSBA (Restricted RSBA). > + * > + * - CPUs are not expected to enumerate both RSBA and RRSBA. > + * > + * Some parts (Broadwell) are not expected to ever enumerate this > + * behaviour directly. Other parts have differing enumeration with > + * microcode version. Fix up Xen's idea, so we can advertise them safely > + * to guests, and so toolstacks can level a VM safety for migration. > + * > + * The following states exist: > + * > + * | | RSBA | EIBRS | RRSBA | Notes | Action | > + * |---+------+-------+-------+--------------------+---------------| > + * | 1 | 0 | 0 | 0 | OK (older parts) | Maybe +RSBA | > + * | 2 | 0 | 0 | 1 | Broken | +RSBA, -RRSBA | > + * | 3 | 0 | 1 | 0 | OK (pre-Aug ucode) | +RRSBA | > + * | 4 | 0 | 1 | 1 | OK | | > + * | 5 | 1 | 0 | 0 | OK | | > + * | 6 | 1 | 0 | 1 | Broken | -RRSBA | > + * | 7 | 1 | 1 | 0 | Broken | -RSBA, +RRSBA | > + * | 8 | 1 | 1 | 1 | Broken | -RSBA | > * > + * However, we doesn't need perfect adherence to the spec. Identify the
Nit: "don't" or "it"? > + * broken cases (so we stand a chance of spotting violated assumptions), > + * and fix up Rows 1 and 3 so Xen can use RSBA || RRSBA to identify > + * "alternative predictors potentially in use". Considering that it's rows 2, 6, 7, and 8 which are broken, I find this comment a little misleading. To me it doesn't become clear whether them subsequently being left alone (and merely a log message issued) is intentional. > + */ > + if ( cpu_has_eibrs ? cpu_has_rsba /* Rows 7, 8 */ > + : cpu_has_rrsba /* Rows 2, 6 */ ) > + printk(XENLOG_ERR > + "FIRMWARE BUG: CPU %02x-%02x-%02x, ucode 0x%08x: RSBA %u, > EIBRS %u, RRSBA %u\n", > + boot_cpu_data.x86, boot_cpu_data.x86_model, > + boot_cpu_data.x86_mask, ucode_rev, > + cpu_has_rsba, cpu_has_eibrs, cpu_has_rrsba); > + > + /* > * Processors offering Enhanced IBRS are not guarenteed to be > * repoline-safe. > */ > - if ( cpu_has_rsba || cpu_has_eibrs ) > + if ( cpu_has_eibrs ) > + { > + /* > + * Prior to the August 2023 microcode, many eIBRS-capable parts did > + * not enumerate RRSBA. > + */ > + if ( !cpu_has_rrsba ) > + setup_force_cpu_cap(X86_FEATURE_RRSBA); > + > + return false; > + } No clearing of RSBA in this case? I fear we may end up with misbehavior if our own records aren't kept consistent with our assumptions. (This then extends to the "just a log message" above as well.) Also the inner conditional continues to strike me as odd; could you add half a sentence to the comment (or description) as to meaning to leave is_forced_cpu_cap() function correctly (which in turn raises the question whether - down the road - this is actually going to matter)? > + /* > + * RSBA is explicitly enumerated in some cases, but may also be set by a > + * hypervisor to indicate that we may move to a processor which isn't > + * retpoline-safe. > + */ > + if ( cpu_has_rsba ) > return false; > > + /* > + * At this point, we've filtered all the legal RSBA || RRSBA cases (or > the > + * known non-ideal cases). If ARCH_CAPS is visible, trust the absence of > + * RSBA || RRSBA. There's no known microcode which advertises ARCH_CAPS > + * without RSBA or EIBRS, and if we're virtualised we can't rely the > model > + * check anyway. > + */ I think "no known" wants further qualification: When IBRSB was first introduced, EIBRS and RSBA weren't even known about. I also didn't think all hardware (i.e. sufficiently old one) did get newer ucode when these started to be known. Possibly you mean "... which wrongly advertises ..."? > @@ -689,6 +762,15 @@ static bool __init retpoline_calculations(void) > break; > } > > + if ( !safe ) > + { > + /* > + * Note: the eIBRS-capable parts are filtered out earlier, so the > + * remainder here are the ones which suffer only RSBA behaviour. > + */ As I think I had mentioned already, I find "only" here odd, when RSBA is more severe than RRSBA. Maybe the "only" could move earlier, e.g. between "are" and "the"? Then again this may be another non-native- speaker issue of mine ... Jan