On 06/08, Peter Zijlstra wrote: > > On Mon, Jun 08, 2015 at 11:14:17AM +0200, Peter Zijlstra wrote: > > > Finally. Suppose that timer->function() returns HRTIMER_RESTART > > > and hrtimer_active() is called right after __run_hrtimer() sets > > > cpu_base->running = NULL. I can't understand why hrtimer_active() > > > can't miss ENQUEUED in this case. We have wmb() in between, yes, > > > but then hrtimer_active() should do something like > > > > > > active = cpu_base->running == timer; > > > if (!active) { > > > rmb(); > > > active = state != HRTIMER_STATE_INACTIVE; > > > } > > > > > > No? > > > > Hmm, good point. Let me think about that. It would be nice to be able to > > avoid more memory barriers. > > So your scenario is: > > [R] seq > RMB > [S] ->state = ACTIVE > WMB > [S] ->running = NULL > [R] ->running (== NULL) > [R] ->state (== INACTIVE; fail to observe > the ->state store due to > lack of order) > RMB > [R] seq (== seq) > [S] seq++ > > Conversely, if we re-order the (first) seq++ store such that it comes > first: > > [S] seq++ > > [R] seq > RMB > [R] ->running (== NULL) > [S] ->running = timer; > WMB > [S] ->state = INACTIVE > [R] ->state (== INACTIVE) > RMB > [R] seq (== seq) > > And we have another false negative. > > And in this case we need the read order the other way around, we'd need: > > active = timer->state != HRTIMER_STATE_INACTIVE; > if (!active) { > smp_rmb(); > active = cpu_base->running == timer; > } > > Now I think we can fix this by either doing: > > WMB > seq++ > WMB > > On both sides of __run_hrtimer(), or do > > bool hrtimer_active(const struct hrtimer *timer) > { > struct hrtimer_cpu_base *cpu_base; > unsigned int seq; > > do { > cpu_base = READ_ONCE(timer->base->cpu_base); > seq = raw_read_seqcount(&cpu_base->seq); > > if (timer->state != HRTIMER_STATE_INACTIVE) > return true; > > smp_rmb(); > > if (cpu_base->running == timer) > return true; > > smp_rmb(); > > if (timer->state != HRTIMER_STATE_INACTIVE) > return true; > > } while (read_seqcount_retry(&cpu_base->seq, seq) || > cpu_base != READ_ONCE(timer->base->cpu_base)); > > return false; > }
You know, I simply can't convince myself I understand why this code correct... or not. But contrary to what I said before, I agree that we need to recheck timer->base. This probably needs more discussion, to me it is very unobvious why we can trust this cpu_base != READ_ONCE() check. Yes, we have a lot of barriers, but they do not pair with each other. Lets ignore this for now. > And since __run_hrtimer() is the more performance critical code, I think > it would be best to reduce the amount of memory barriers there. Yes, but wmb() is cheap on x86... Perhaps we can make this code "obviously correct" ? How about the following..... We add cpu_base->seq as before but limit its "write" scope so that we cam use the regular read/retry. So, hrtimer_active(timer) { do { base = READ_ONCE(timer->base->cpu_base); seq = read_seqcount_begin(&cpu_base->seq); if (timer->state & ENQUEUED || base->running == timer) return true; } while (read_seqcount_retry(&cpu_base->seq, seq) || base != READ_ONCE(timer->base->cpu_base)); return false; } And we need to avoid the races with 2 transitions in __run_hrtimer(). The first race is trivial, we change __run_hrtimer() to do write_seqcount_begin(cpu_base->seq); cpu_base->running = timer; __remove_hrtimer(timer); // clears ENQUEUED write_seqcount_end(cpu_base->seq); and hrtimer_active() obviously can't race with this section. Then we change enqueue_hrtimer() + bool need_lock = base->cpu_base->running == timer; + if (need_lock) + write_seqcount_begin(cpu_base->seq); + timer->state |= HRTIMER_STATE_ENQUEUED; + + if (need_lock) + write_seqcount_end(cpu_base->seq); Now. If the timer is re-queued by the time __run_hrtimer() clears ->running we have the following sequence: write_seqcount_begin(cpu_base->seq); timer->state |= HRTIMER_STATE_ENQUEUED; write_seqcount_end(cpu_base->seq); base->running = NULL; and I think this should equally work, because in this case we do not care if hrtimer_active() misses "running = NULL". Yes, we only have this 2nd write_seqcount_begin/end if the timer re- arms itself, but otherwise we do not race. If another thread does hrtime_start() in between we can pretend that hrtimer_active() hits the "inactive". What do you think? And. Note that we can rewrite these 2 "write" critical sections in __run_hrtimer() and enqueue_hrtimer() as cpu_base->running = timer; write_seqcount_begin(cpu_base->seq); write_seqcount_end(cpu_base->seq); __remove_hrtimer(timer); and timer->state |= HRTIMER_STATE_ENQUEUED; write_seqcount_begin(cpu_base->seq); write_seqcount_end(cpu_base->seq); base->running = NULL; So we can probably use write_seqcount_barrier() except I am not sure about the 2nd wmb... Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/