On Thu, Nov 13, 2014 at 05:18:16PM +0530, Aravinda Prasad wrote: > On Thursday 13 November 2014 04:02 PM, David Gibson wrote: > > On Thu, Nov 13, 2014 at 11:28:30AM +0530, Aravinda Prasad wrote: [snip] > >>>>> Having to retry the hcall from here seems very awkward. This is a > >>>>> private hcall, so you can define it to do whatever retries are > >>>>> necessary internally (and I don't think your current implementation > >>>>> can fail anyway). > >>>> > >>>> Retrying is required in the cases when multi-processors experience > >>>> machine check at or about the same time. As per PAPR, subsequent > >>>> processors should serialize and wait for the first processor to issue > >>>> the ibm,nmi-interlock call. The second processor retries if the first > >>>> processor which received a machine check is still reading the error log > >>>> and is yet to issue ibm,nmi-interlock call. > >>> > >>> Hmm.. ok. But I don't see any mechanism in the patches by which > >>> H_REPORT_MC_ERR will report failure if another CPU has an MC in > >>> progress. > >> > >> h_report_mc_err returns 0 if another VCPU is processing machine check > >> and in that case we retry. h_report_mc_err returns error log address if > >> no other VCPU is processing machine check. > > > > Uh.. how? I'm only seeing one return statement in the implementation > > in 3/4. > > This part is in 4/4 which handles ibm,nmi-interlock call in > h_report_mc_err() > > + if (mc_in_progress == 1) { > + return 0; > + }
Ah, right, missed the change to h_report_mc_err() in the later patch. > >>>> Retrying cannot be done internally in h_report_mc_err hcall: only one > >>>> thread can succeed entering qemu upon parallel hcall and hence retrying > >>>> inside the hcall will not allow the ibm,nmi-interlock from first CPU to > >>>> succeed. > >>> > >>> It's possible, but would require some fiddling inside the h_call to > >>> unlock and wait for the other CPUs to finish, so yes, it might be more > >>> trouble than it's worth. > >>> > >>>> > >>>>> > >>>>>> + mtsprg 2,4 > >>>>> > >>>>> Um.. doesn't this clobber the value of r3 you saved in SPRG2 just above. > >>>> > >>>> The r3 saved in SPRG2 is moved to rtas area in the private hcall and > >>>> hence it is fine to clobber r3 here > >>> > >>> Ok, if you're going to do some magic register saving inside the HCALL, > >>> why not do the SRR[01] and CR restoration inside there as well. > >> > >> SRR0/1 is clobbered while returning from HCALL and hence cannot be > >> restored in HCALL. For CR, we need to do the restoration here as we > >> clobber CR after returning from HCALL (the instruction checking the > >> return value of hcall clobbers CR). > > > > Hrm. AFAICT SRR0/1 shouldn't be clobbered when returning from an > > As hcall is an interrupt, SRR0 is set to nip and SRR1 to msr just before > executing rfid. AFAICT the return path from the hypervisor - including for hcalls - uses HSSR0/1 and hrfid, so ordinary SRR0/SRR1 should be ok. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
pgp72Nu8OzvPn.pgp
Description: PGP signature