On Wed, 2013-04-10 at 13:35 +0200, Peter Zijlstra wrote: > On Fri, 2013-04-05 at 13:59 -0400, Olivier Langlois wrote: > > Process timers are moving fasters than their corresponding > > cpu clock for various reasons: > > > > 1. There is a race condition when getting a timer sample that makes the > > sample > > be smaller than it should leading to setting the timer expiration to > > soon. > > 2. When initializing the cputimer, by including tasks deltas in the initial > > timer value, it makes them be counted twice. > > 3. When a thread autoreap itself when exiting, the last context switch > > update > > will update the cputimer and not the overall process values stored in > > signal. > > Please explain these races. Things like task_sched_runtime() on which > most of this stuff is build read both sum_exec_runtime and compute the > delta while holding the rq->lock; this should avoid any and all races > against update_curr() and the sort. > In my previous reply, I have explained in length the race condition but I didn't realize that you were also mentioning my refactoring of task_sched_runtime() so I comment a little bit more about this proposal:
currently: - cputimer is initialized with the result of thread_group_cputime() which is (accounted time + tasks deltas) - cputimer sample value is then cputimer + 1 more task_delta_exec() - After all active tasks pass through update_curr(), cputimer is (accounted time + 2*(tasks deltas)) By being able to get separately get accounted time and delta, you can: - Initialize cputimer to accounted time - thread group cputimer sample will be cputimer + delta (which is essentially equivalent to what would thread_group_cputime() return) - After all the deltas are in by having called account_group_exec_runtime(), cputimer will be set to (accounted time + tasks delta) and have the exact same value of the corresponding process clock. In other words, currently the way the cputimer is initialized contribute to make it advance faster than its corressponding process clock. This part of the patch has nothing to do with race condition, as far as I can tell, thread_group_cputime() and task_delta_exec() are rock solid. It is just that you need delta and accounted time separately and preferably atomically to be able to initialize posix cpu timer correctly. Greetings, Olivier -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/