On 12/11/15 19:49, Aravinda Prasad wrote: > > On Thursday 12 November 2015 03:10 PM, Thomas Huth wrote: ... >> Also LoPAPR talks about 'subsequent processors report "fatal error >> previously reported"', so maybe the other processors should report that >> condition in this case? > > I feel guest kernel is responsible for that or does that mean that qemu > should report the same error, which first processor encountered, for > subsequent processors? In that case what if the error encountered by > first processor was recovered.
I simply refered to this text in LoPAPR: Multiple processors of the same OS image may experi- ence fatal events at, or about, the same time. The first processor to enter the machine check handling firmware reports the fatal error. Subsequent processors serialize waiting for the first processor to issue the ibm,nmi-interlock call. These subsequent processors report "fatal error previously reported". Is there code in the host kernel already that takes care of this (I haven't checked)? If so, how does the host kernel know that the event happened "at or about the same time" since you're checking at the QEMU side for the mutex condition? >> And of course you've also got to check that the same CPU is not getting >> multiple NMIs before the interlock function has been called again. > > I think it is good to check that. However, shouldn't the guest enable ME > until it calls interlock function? First, the hypervisor should never trust the guest to do the right things. Second, LoPAPR says "the OS permanently relinquishes to firmware the Machine State Register's Machine Check Enable bit", and Paul also said something similar in another mail to this thread, so I think you really have to check this in QEMU instead. Thomas