Hi Paul, [Added Shreyas's current e-mail address ]
On Fri, Oct 21, 2016 at 08:03:05PM +1100, Paul Mackerras wrote: > Commit 8117ac6a6c2f ("powerpc/powernv: Switch off MMU before entering > nap/sleep/rvwinkle mode", 2014-12-10) fixed a race condition where one > thread entering a KVM guest could switch the MMU context to the guest > while another thread was still in host kernel context with the MMU on. > That commit moved the point where a thread entering a power-saving > mode set its kvm_hstate.hwthread_state field in its PACA to > KVM_HWTHREAD_IN_IDLE from a point where the MMU was on to after the > MMU had been switched off. That commit also added a comment > explaining that we have to switch to real mode before setting > hwthread_state to avoid this race. > > Nevertheless, commit 4eae2c9ae54a ("powerpc/powernv: Make > pnv_powersave_common more generic", 2016-07-08) subsequently moved > the setting of hwthread_state back to a point where the MMU is on, > thus reintroducing the race, despite the comment saying that this > should not be done being included in full in the context lines of > the patch that did it. > Sorry about missing that part. I am at fault, since I reviewed 4eae2c9ae54a patch. Will keep this in mind in the future. > This fixes the race again and adds a bigger and shoutier comment > explaining the potential race condition. > > Cc: sta...@vger.kernel.org # v4.8 > Fixes: 4eae2c9ae54a > Signed-off-by: Paul Mackerras <pau...@ozlabs.org> > --- > arch/powerpc/kernel/idle_book3s.S | 32 ++++++++++++++++++++++++++------ > 1 file changed, 26 insertions(+), 6 deletions(-) > > diff --git a/arch/powerpc/kernel/idle_book3s.S > b/arch/powerpc/kernel/idle_book3s.S > index bd739fe..0d8712a 100644 > --- a/arch/powerpc/kernel/idle_book3s.S > +++ b/arch/powerpc/kernel/idle_book3s.S > @@ -163,12 +163,6 @@ _GLOBAL(pnv_powersave_common) > std r9,_MSR(r1) > std r1,PACAR1(r13) > > -#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE > - /* Tell KVM we're entering idle */ > - li r4,KVM_HWTHREAD_IN_IDLE > - stb r4,HSTATE_HWTHREAD_STATE(r13) > -#endif > - > /* > * Go to real mode to do the nap, as required by the architecture. > * Also, we need to be in real mode before setting hwthread_state, > @@ -185,6 +179,26 @@ _GLOBAL(pnv_powersave_common) > > .globl pnv_enter_arch207_idle_mode > pnv_enter_arch207_idle_mode: > +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE > + /* Tell KVM we're entering idle */ > + li r4,KVM_HWTHREAD_IN_IDLE > + /******************************************************/ > + /* N O T E W E L L ! ! ! N O T E W E L L */ > + /* The following store to HSTATE_HWTHREAD_STATE(r13) */ > + /* MUST occur in real mode, i.e. with the MMU off, */ > + /* and the MMU must stay off until we clear this flag */ > + /* and test HSTATE_HWTHREAD_REQ(r13) in the system */ > + /* reset interrupt vector in exceptions-64s.S. */ > + /* The reason is that another thread can switch the */ > + /* MMU to a guest context whenever this flag is set */ > + /* to KVM_HWTHREAD_IN_IDLE, and if the MMU was on, */ > + /* that would potentially cause this thread to start */ > + /* executing instructions from guest memory in */ > + /* hypervisor mode, leading to a host crash or data */ > + /* corruption, or worse. */ > + /******************************************************/ > + stb r4,HSTATE_HWTHREAD_STATE(r13) > +#endif > stb r3,PACA_THREAD_IDLE_STATE(r13) > cmpwi cr3,r3,PNV_THREAD_SLEEP > bge cr3,2f > @@ -250,6 +264,12 @@ enter_winkle: > * r3 - requested stop state > */ > power_enter_stop: > +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE > + /* Tell KVM we're entering idle */ > + li r4,KVM_HWTHREAD_IN_IDLE > + /* DO THIS IN REAL MODE! See comment above. */ > + stb r4,HSTATE_HWTHREAD_STATE(r13) > +#endif > /* > * Check if the requested state is a deep idle state. > */ > -- > 2.7.4 >