On 20/02/2024 17:27, John Allen wrote:
> On Wed, Feb 07, 2024 at 11:21:05AM +0000, Joao Martins wrote:
>> On 12/09/2023 22:18, John Allen wrote:
>>> In the event that a guest process attempts to access memory that has
>>> been poisoned in response to a deferred uncorrected MCE, an AMD system
>>> will currently generate a SIGBUS error which will result in the entire
>>> guest being shutdown. Ideally, we only want to kill the guest process
>>> that accessed poisoned memory in this case.
>>>
>>> This support has been included in qemu for Intel hosts for a long time,
>>> but there are a couple of changes needed for AMD hosts. First, we will
>>> need to expose the SUCCOR cpuid bit to guests. Second, we need to modify
>>> the MCE injection code to avoid Intel specific behavior when we are
>>> running on an AMD host.
>>>
>>
>> Is there any update with respect to this series?
>>
>> John's series should fix MCE injection on AMD; as today it is just crashing 
>> the
>> guest (sadly) when an MCE happens in the hypervisor.
>>
>> William, Paolo, I think the sort-of-dependency(?) of this where we block
>> migration if there was a poisoned page on is already in Peter's migration
>> tree[1] (CC'ed). So perhaps this series just needs John to resend it given 
>> that
>> it's been a couple months since v4?
> 
> It looks like this series still applies cleanly to latest qemu, but I
> can resend if needed.
> 
That's great I suppose.

I was hoping Paolo responds, to understand next steps.

There's also the other kernel patch that Paolo suggested[0], to declare the
SUCCOR bit in the kvm supported CPUID? Maybe it's being held up because of that?

[0]
https://lore.kernel.org/qemu-devel/d4c1bb9b-8438-ed00-c79d-e8ad2a7e4...@redhat.com/

> Thanks,
> John
> 
>>
>> [1]
>> https://lore.kernel.org/qemu-devel/20240130190640.139364-2-william.ro...@oracle.com/
>>
>>> v2:
>>>   - Add "succor" feature word.
>>>   - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature.
>>>
>>> v3:
>>>   - Reorder series. Only enable SUCCOR after bugs have been fixed.
>>>   - Introduce new patch ignoring AO errors.
>>>
>>> v4:
>>>   - Remove redundant check for AO errors.
>>>
>>> John Allen (2):
>>>   i386: Fix MCE support for AMD hosts
>>>   i386: Add support for SUCCOR feature
>>>
>>> William Roche (1):
>>>   i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest
>>>
>>>  target/i386/cpu.c     | 18 +++++++++++++++++-
>>>  target/i386/cpu.h     |  4 ++++
>>>  target/i386/helper.c  |  4 ++++
>>>  target/i386/kvm/kvm.c | 28 ++++++++++++++++++++--------
>>>  4 files changed, 45 insertions(+), 9 deletions(-)
>>>
>>


Reply via email to