On Thu, Feb 27, 2014 at 12:00:14PM -0500, Steven Rostedt wrote:
> On Thu, 27 Feb 2014 17:37:32 +0100
> Frederic Weisbecker <fweis...@gmail.com> wrote:
> 
> > On Thu, Feb 27, 2014 at 10:46:18AM -0500, Steven Rostedt wrote:
> > > [Request for Ack]
> > > 
> > > From: Petr Mladek <pmla...@suse.cz>
> > > 
> > > If a failure occurs while modifying ftrace function, it bails out and will
> > > remove the tracepoints to be back to what the code originally was.
> > > 
> > > There is missing the final sync run across the CPUs after the fix up is 
> > > done
> > > and before the ftrace int3 handler flag is reset.
> > 
> > So IIUC the risk is that other CPUs may spuriously ignore non-ftrace traps 
> > if we don't sync the
> > other cores after reverting the int3 before decrementing the 
> > modifying_ftrace_code counter?
> 
> Actually, the bug is that they will not ignore the ftrace traps after
> we decrement modifying_ftrace_code counter. Here's the race:
> 
>       CPU0                            CPU1
>       ----                            ----
>   remove_breakpoint();
>   modifying_ftrace_code = 0;
> 
>                               [still sees breakpoint]
>                               <takes trap>
>                               [sees modifying_ftrace_code as zero]
>                               [no breakpoint handler]
>                               [goto failed case]
>                               [trap exception - kernel breakpoint, no
>                                handler]
>                               BUG()
> 
> 
> Even if we had a smp_wmb() after removing the breakpoint and clearing
> the modifying_ftrace_code, we still need the smp_rmb() on the other
> CPUS. The run_sync() does a IPI on all CPUs doing the smp_rmb().

Ah ok. My understanding was indeed that it doesn't ignore the ftrace trap,
but I thought the consequence was that we return immediately from the trap
handler.

> 
> > 
> > > 
> > > Link: 
> > > http://lkml.kernel.org/r/1393258342-29978-2-git-send-email-pmla...@suse.cz
> > > 
> > > Fixes: 8a4d0a687a5 "ftrace: Use breakpoint method to update ftrace caller"
> > > Cc: sta...@vger.kernel.org # 3.5+
> > > Signed-off-by: Petr Mladek <pmla...@suse.cz>
> > > Signed-off-by: Steven Rostedt <rost...@goodmis.org>
> > > ---
> > >  arch/x86/kernel/ftrace.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> > > index 6b566c8..69885e2 100644
> > > --- a/arch/x86/kernel/ftrace.c
> > > +++ b/arch/x86/kernel/ftrace.c
> > > @@ -660,8 +660,8 @@ ftrace_modify_code(unsigned long ip, unsigned const 
> > > char *old_code,
> > >           ret = -EPERM;
> > >           goto out;
> > >   }
> > > - run_sync();
> > >   out:
> > > + run_sync();
> > >   return ret;
> > >  
> > >   fail_update:
> > 
> > This could be further optimized by rather calling run_sync() in the end of 
> > the
> > fail_update block (after the probe_kernel_write revert) otherwise even 
> > failure on
> > setting the break will result in run_sync(), which doesn't appear to be 
> > needed. But
> > that's really just nitpicking as it's a rare failure codepath and shouldn't 
> > hurt.
> 
> No, the run_sync() must be done after removing the breakpoint. Again,
> we don't want one of these breakpoints to be called on another CPU and
> then see modifying_ftrace_code as zero. That is bad. The final
> run_sync() is required.

Ok but what I meant is to do this instead:

 fail_update:
    probe_kernel_write((void *)ip, &old_code[0], 1);
+   run_sync()
    goto out;

Because with the current patch we also call run_sync() on add_break() failure.

> 
> I think I'll update the change log to include my race flow graph from
> above.
> 
> -- Steve
> 
> 
> > 
> > In any case, the fix looks correct.
> > 
> > Acked-by: Frederic Weisbecker <fweis...@gmail.com>
> > 
> > > -- 
> > > 1.8.5.3
> > > 
> > > 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to