On Mon, 7 May 2012, Paul E. McKenney wrote: > On Mon, May 07, 2012 at 09:21:54AM -0700, Hugh Dickins wrote: > > > > In 70 hours I got six isolated messages like the below (but from > > different __might_sleep callsites) - where before I'd have flurries > > of hundreds(?) and freeze within the hour. > > > > And the "rcu_nesting" debug line I'd added to the message was different: > > where before it was showing ffffffff on some tasks and 1 on others i.e. > > increment or decrement had been applied to the wrong task, these messages > > now all showed 0s throughout i.e. by the time the message was printed, > > there was no longer any justification for the message. > > > > As if a memory barrier were missing somewhere, perhaps. > > These fields should be updated only by the corresponding CPU, so > if memory barriers are needed, it seems to me that the cross-CPU > access is the bug, not the lack of a memory barrier.
Yes: the code you added appeared to be using local CPU accesses only (very much intentionally), and the context switch should already have provided all the memory barriers needed there. > > Ah... Is preemption disabled across the access to RCU's nesting level > when printing out the message? If not, a preeemption at that point > could result in the value printed being inaccurate. Preemption was enabled in the cases I saw. So you're pointing out that #define rcu_preempt_depth() (__this_cpu_read(rcu_read_lock_nesting)) should have been #define rcu_preempt_depth() (this_cpu_read(rcu_read_lock_nesting)) to avoid the danger of spurious __might_sleep() warnings. Yes, I believe you've got it - thanks. Hugh _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev