On Thursday 13 November 2014 04:02 PM, David Gibson wrote: > On Thu, Nov 13, 2014 at 11:28:30AM +0530, Aravinda Prasad wrote: >> >> >> On Thursday 13 November 2014 09:22 AM, David Gibson wrote: >>> On Tue, Nov 11, 2014 at 12:14:31PM +0530, Aravinda Prasad wrote: >>>> On Tuesday 11 November 2014 08:46 AM, David Gibson wrote: >>>>> On Wed, Nov 05, 2014 at 12:43:15PM +0530, Aravinda Prasad wrote: >>> [snip] >>>>>> + . = 0x200 >>>>>> + /* >>>>>> + * Trampoline saves r3 in sprg2 and issues private hcall >>>>>> + * to request qemu to build error log. QEMU builds the >>>>>> + * error log, copies to rtas-blob and returns the address. >>>>>> + * The initial 16 bytes in return adress consist of saved >>>>>> + * srr0 and srr1 which we restore and pass on the actual error >>>>>> + * log address to OS handled mcachine check notification >>>>>> + * routine >>>>>> + * >>>>>> + * All the below instructions are copied to interrupt vector >>>>>> + * 0x200 at the time of handling ibm,nmi-register rtas call. >>>>>> + */ >>>>>> + mtsprg 2,3 >>>>>> + li 3,0 >>>>>> + /* >>>>>> + * ori r3,r3,KVMPPC_H_REPORT_MC_ERR. The KVMPPC_H_REPORT_MC_ERR >>>>>> + * value is patched below >>>>>> + */ >>>>>> +1: ori 3,3,0 >>>>>> + sc 1 /* Issue H_CALL */ >>>>>> + cmpdi cr0,3,0 >>>>>> + beq cr0,1b /* retry KVMPPC_H_REPORT_MC_ERR */ >>>>> >>>>> Having to retry the hcall from here seems very awkward. This is a >>>>> private hcall, so you can define it to do whatever retries are >>>>> necessary internally (and I don't think your current implementation >>>>> can fail anyway). >>>> >>>> Retrying is required in the cases when multi-processors experience >>>> machine check at or about the same time. As per PAPR, subsequent >>>> processors should serialize and wait for the first processor to issue >>>> the ibm,nmi-interlock call. The second processor retries if the first >>>> processor which received a machine check is still reading the error log >>>> and is yet to issue ibm,nmi-interlock call. >>> >>> Hmm.. ok. But I don't see any mechanism in the patches by which >>> H_REPORT_MC_ERR will report failure if another CPU has an MC in >>> progress. >> >> h_report_mc_err returns 0 if another VCPU is processing machine check >> and in that case we retry. h_report_mc_err returns error log address if >> no other VCPU is processing machine check. > > Uh.. how? I'm only seeing one return statement in the implementation > in 3/4.
This part is in 4/4 which handles ibm,nmi-interlock call in h_report_mc_err() + if (mc_in_progress == 1) { + return 0; + } > >>>> Retrying cannot be done internally in h_report_mc_err hcall: only one >>>> thread can succeed entering qemu upon parallel hcall and hence retrying >>>> inside the hcall will not allow the ibm,nmi-interlock from first CPU to >>>> succeed. >>> >>> It's possible, but would require some fiddling inside the h_call to >>> unlock and wait for the other CPUs to finish, so yes, it might be more >>> trouble than it's worth. >>> >>>> >>>>> >>>>>> + mtsprg 2,4 >>>>> >>>>> Um.. doesn't this clobber the value of r3 you saved in SPRG2 just above. >>>> >>>> The r3 saved in SPRG2 is moved to rtas area in the private hcall and >>>> hence it is fine to clobber r3 here >>> >>> Ok, if you're going to do some magic register saving inside the HCALL, >>> why not do the SRR[01] and CR restoration inside there as well. >> >> SRR0/1 is clobbered while returning from HCALL and hence cannot be >> restored in HCALL. For CR, we need to do the restoration here as we >> clobber CR after returning from HCALL (the instruction checking the >> return value of hcall clobbers CR). > > Hrm. AFAICT SRR0/1 shouldn't be clobbered when returning from an As hcall is an interrupt, SRR0 is set to nip and SRR1 to msr just before executing rfid. > hcall. You're right about CR though. Or more precisely, you can't > both restore CR and use it to return the success state of the hcall. > Well.. actually, you could use crN to return the hcall success, since > you're only restoring cr0 anyway (although only restoring cr0 seems > odd to me in the first place). > Yes it is possible to return the hcall success state through crN and restore the entire cr register after checking. -- Regards, Aravinda