On Thu, 10 Jan 2019 at 12:09, gengdongjiu <gengdong...@huawei.com> wrote: > Peter, I summarize James's main idea, James think QEMU does not needs > to check *something* if Qemu support firmware-first. > What do we do for your comments?
Unless I'm missing something, the code in your most recent patchset attempts to update an ACPI table when it gets the SIGBUS from the host kernel without doing anything to check whether it has ever created the ACPI table (and set up the QEMU global variable that tells the code where it is in the guest memory) in the first place. I don't see how that can work. > >> I think one question here which it would be good to answer is: > >> if we are modelling a guest and we haven't specifically provided > >> it an ACPI table to tell it about memory errors, what do we do > >> when we get a sigbus from the host? We have basically two choices: > >> (1) send the guest an SError (aka asynchronous external abort) > >> anyway (with no further info about what the memory error is) > > > > For an AR signal an external abort is valid. Its up to the implementation > > whether these are synchronous or asynchronous. Qemu can only take a signal > > for > > something that was synchronous, so you can choose between the two. > > Synchronous external abort is marginally better as an unaware OS knows its > > affects this thread, and may be able to kill it. > > SError with an imp-def ESR is indistinguishable from 'part of the soc fell > > out', > > and should always result in a panic(). > > > > > >> (2) just stop QEMU (as we would for a memory error in QEMU's > >> own memory) > > > > This is also valid. A machine may take external-abort to EL3 and then > > reboot/crash/burn. We should decide which of these we want to do, and have a comment explaining what we're doing. If I'm reading your current patchset correctly, it does neither -- if it can't record the fault in the ACPI table it just ignores it without either stopping QEMU or delivering an SError. I think I favour option (2) here. thanks -- PMM