On 17.11.2025 23:21, Andrew Cooper wrote:
> wait_for_state() returns false on encountering LOADING_EXIT.
> control_thread_fn() can move directly to this state in the case of an early
> error. It is not an error condition for APs, but right now the latest write
> into stopmachine_data.fn_result wins, causing the real error, -EIO, to get
> clobbered with -EBUSY. e.g.:
>
> # xen-ucode /lib/firmware/amd-ucode/microcode_amd_fam17h.bin --force
> Failed to update microcode. (err: Device or resource busy)
>
> (XEN) 256 cores are to update their microcode
> (XEN) microcode: CPU0 update rev 0x830107d to 0x830107c failed, result
> 0x830107d
> (XEN) Late loading aborted: CPU0 failed to update ucode: -5
>
> Drop all the -EBUSY's, and treat hitting LOADING_EXIT as a success case. This
> causes only a single error to be returned through stop_machine_run(). e.g.:
Why "single"? stop_machine_run() can't return multiple ones, having only a
scalar return type? Or do you mean "a single, consistent" or some such?
> # xen-ucode /lib/firmware/amd-ucode/microcode_amd_fam17h.bin --force
> Failed to update microcode. (err: Input/output error)
>
> (XEN) 256 cores are to update their microcode
> (XEN) microcode: CPU0 update rev 0x830107d to 0x830107c failed, result
> 0x830107d
> (XEN) Late loading aborted: CPU0 failed to update ucode: -5
The sole difference being which specific error is observed, which looks to
support the above interpretation. What I don't quite understand is ...
> Fixes: 5ed12565aa32 ("microcode: rendezvous CPUs in NMI handler and load
> ucode")
... this and the specific indication that this needs backporting: Why is
the particular error code this important here?
> --- a/xen/arch/x86/cpu/microcode/core.c
> +++ b/xen/arch/x86/cpu/microcode/core.c
> @@ -260,7 +260,9 @@ static int secondary_nmi_work(void)
> {
> cpumask_set_cpu(smp_processor_id(), &cpu_callin_map);
>
> - return wait_for_state(LOADING_EXIT) ? 0 : -EBUSY;
> + wait_for_state(LOADING_EXIT);
> +
> + return 0;
> }
At which point the function could as well return void? Preferably with this
adjustment (and the knock-on one at the call site) and with the slight
clarification to the description
Reviewed-by: Jan Beulich <[email protected]>
> @@ -271,7 +273,7 @@ static int primary_thread_work(const struct
> microcode_patch *patch,
> cpumask_set_cpu(smp_processor_id(), &cpu_callin_map);
>
> if ( !wait_for_state(LOADING_ENTER) )
> - return -EBUSY;
> + return 0;
>
> ret = alternative_call(ucode_ops.apply_microcode, patch, flags);
> if ( !ret )
> @@ -313,7 +315,7 @@ static int cf_check microcode_nmi_callback(
> static int secondary_thread_fn(void)
> {
> if ( !wait_for_state(LOADING_CALLIN) )
> - return -EBUSY;
> + return 0;
>
> self_nmi();
>
> @@ -336,7 +338,7 @@ static int primary_thread_fn(const struct microcode_patch
> *patch,
> unsigned int flags)
> {
> if ( !wait_for_state(LOADING_CALLIN) )
> - return -EBUSY;
> + return 0;
>
> if ( ucode_in_nmi )
> {
Vaguely recalling the original intentions, these changes looked wrong to me at
the first glance. But yes, an exit indication from the control thread isn't
really a separate error condition.
Jan