On Thu, May 01, 2014 at 02:49:01PM -0400, Vince Weaver wrote:
> 
> OK, humor me a bit here.
> 
> I'm looking at the buggy trace and comparing against a "good" trace where 
> the bug doesn't happen.
> 
> It is a rance condition of sorts, because it's just a 10us or so 
> interleaving of calls that causes the bug to happen or not.
> 
> In the good trace:
> 
>       [parent] __perf_event_task_sched_out (and hence perf_swevent_del)
>       [child]  perf_release
> 
> In the buggy trace:
> 
>       [child] perf_release
>       [parent] __perf_event_task_sched_out (perf_swevent_del never happens)
> 
> 
> perf_swevent_del calls
>       hlist_del_rcu(event->hlist_entry)
> to remove the event from the swevent hlist.
> 
> Now in theory perf_release() calls sw_perf_event_destroy() which you
> would think would also call the above.  Instead it does
>        swevent_hlist_put_cpu(event, cpu);
> which does all kinds of weird hash stuff that I don't follow.
> 
> Should the above two be equivelent?  Is it reference counting in there 
> with if (!--swhash->hlist_refcount) causing the issue?

perf_release()
  put_event()
    perf_remove_from_context()
      __perf_remove_from_context()
        event_sched_out()
          ->del()

is the path that would call ->del() and hlist_del_rcu().

Now perf_remove_from_context() only calls __perf_remove_from_context()
when the task is active somewhere, otherwise it simply calls
list_del_event().

Both perf_remove_from_context() and perf_event_context_sched_out() (as
called from __perf_event_task_sched_out) hold ctx->lock, so they should
be serialized against each other.

Clearly I'm missing something though, will go stare at the trace now.

Attachment: pgpi1MyBojgPL.pgp
Description: PGP signature

Reply via email to