On Wed, Feb 17, 2016 at 10:35:39AM +0100, Jiri Olsa wrote: > On Wed, Feb 17, 2016 at 04:34:16AM +0100, Oleg Nesterov wrote: > > Finally I reproduced... let me add CC's and reply to initial message. This > > has > > nothing to do with arm/uprobes. > > > > I simply can't understand how perf calculates ->total_time_enabled/running. > > At all. But the problem is that > > > > 1. perf_event_enable_on_exec() does enable first, then event_sched_in(). > > > > After that tstamp_enabled < tstamp_running > > > > 2. This means that after the next update_event_times() > > total_time_running < total_time_enabled > > > > again, I fail to understand these calculations, but this is what > > perf_event_read_value() reports to user-space. > > > > 3. /usr/bin/perf calls perf_counts_values__scale() which does > > > > else if (count->run < count->ena) { > > scaled = 1; > > count->val = (u64)((double) count->val * count->ena / > > count->run + 0.5); > > } > > > > and this is why you see the wrong number. count->val was correct but > > wrongly updated because total_time_running < total_time_enabled. > > > > I leave this to Peter and Jiri ;) > > I did not notice the ther conversation wasn't public, reposting ;-) > > jirka > > --- > ouch, I tested with fedora kernel.. I can reproduce with 4.5 > > Pratyush bisected this into following commit: > [3e349507d12de93b08b0aa814fc2aa0dee91c5ba] perf: Fix perf_enable_on_exec() > event scheduling > > it seems the commit above introduced unwanted difference between > counter's enabled and running times.. I'm checking on that ;-)
Does something like so work? --- So prior to 3e349507d12d ("perf: Fix perf_enable_on_exec() event scheduling") we used to call task_ctx_sched_out() before event_enable_on_exec(). ctx_sched_out() will call update_context_time(), therefore __perf_event_mark_enabled() would have an up-to-date ctx->time. Now, not so much. So explicitly update the ctx time before calling event_enable_on_exec(). ctx_resched() will again call update_context_time(), resulting in a slight difference the other way (running > enabled), which doesn't make any sense either, but that we can (and should) clip. --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -3173,6 +3173,10 @@ static void perf_event_enable_on_exec(in cpuctx = __get_cpu_context(ctx); perf_ctx_lock(cpuctx, ctx); + + update_context_time(ctx); + update_cgrp_time_from_cpuctx(cpuctx); + list_for_each_entry(event, &ctx->event_list, event_entry) enabled |= event_enable_on_exec(event, ctx);