----- On Jan 14, 2019, at 8:09 AM, Peter Zijlstra pet...@infradead.org wrote:
> On Thu, Jan 10, 2019 at 12:45:27PM -0500, Mathieu Desnoyers wrote: > >> static void perf_pending_event(struct irq_work *entry) >> { >> struct perf_event *event = container_of(entry, >> struct perf_event, pending); >> int rctx; >> >> rctx = perf_swevent_get_recursion_context(); >> /* >> * If we 'fail' here, that's OK, it means recursion is already >> disabled >> * and we won't recurse 'further'. >> */ >> >> if (event->pending_disable) { >> event->pending_disable = 0; >> perf_event_disable_local(event); >> } >> >> if (event->pending_wakeup) { >> event->pending_wakeup = 0; >> perf_event_wakeup(event); >> } >> >> if (rctx >= 0) >> perf_swevent_put_recursion_context(rctx); >> } >> >> One side-effect of perf_event_wakeup() is to generate a sched_waking >> event. But I suspect it won't be traced by perf because it is invoked before >> putting the recursion context. >> >> Is there a reason why the wakeup is done before putting the recursion >> context ? > > d525211f9d1b ("perf: Fix irq_work 'tail' recursion") > > If we were to allow perf_event_wakeup() to generate its tracepoint, we'd > be back to square #1, no? Considering that perf tracing code has side-effects that generate additional events, it's indeed best not to trace them, otherwise you indeed end up with tail recursion. Can ftrace end up in the same situation through rb_wake_up_waiters() ? I suspect the tail recursion would be hard to trigger if the wakeup only happens once per page though, unless the events generated end up filling up a page. FWIW, LTTng avoids this entire issue by using a timer-based polling mechanism to ensure the tracing code does not call into the scheduler wakeup. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com