On Sat, Nov 02, 2013 at 10:32:39AM -0700, Paul E. McKenney wrote: > On Fri, Nov 01, 2013 at 03:56:34PM +0100, Peter Zijlstra wrote: > > On Wed, Oct 30, 2013 at 11:40:15PM -0700, Paul E. McKenney wrote: > > > > Now the whole crux of the question is if we need barrier A at all, since > > > > the STORES issued by the @buf writes are dependent on the ubuf->tail > > > > read. > > > > > > The dependency you are talking about is via the "if" statement? > > > Even C/C++11 is not required to respect control dependencies. > > > > > > This one is a bit annoying. The x86 TSO means that you really only > > > need barrier(), ARM (recent ARM, anyway) and Power could use a weaker > > > barrier, and so on -- but smp_mb() emits a full barrier. > > > > > > Perhaps a new smp_tmb() for TSO semantics, where reads are ordered > > > before reads, writes before writes, and reads before writes, but not > > > writes before reads? Another approach would be to define a per-arch > > > barrier for this particular case. > > > > I suppose we can only introduce new barrier primitives if there's more > > than 1 use-case. > > There probably are others.
If there was an smp_tmb(), I would likely use it in rcu_assign_pointer(). There are some corner cases that can happen with the current smp_wmb() that would be prevented by smp_tmb(). These corner cases are a bit strange, as follows: struct foo gp; void P0(void) { struct foo *p = kmalloc(sizeof(*p); if (!p) return; ACCESS_ONCE(p->a) = 0; BUG_ON(ACCESS_ONCE(p->a)); rcu_assign_pointer(gp, p); } void P1(void) { struct foo *p = rcu_dereference(gp); if (!p) return; ACCESS_ONCE(p->a) = 1; } With smp_wmb(), the BUG_ON() can occur because smp_wmb() does not prevent CPU from reordering the read in the BUG_ON() with the rcu_assign_pointer(). With smp_tmb(), it could not. Now, I am not too worried about this because I cannot think of any use for code like that in P0() and P1(). But if there was an smp_tmb(), it would be cleaner to make the BUG_ON() impossible. Thanx, Paul > > > > If the read shows no available space, we simply will not issue those > > > > writes -- therefore we could argue we can avoid the memory barrier. > > > > > > Proving that means iterating through the permitted combinations of > > > compilers and architectures... There is always hand-coded assembly > > > language, I suppose. > > > > I'm starting to think that while the C/C++ language spec says they can > > wreck the world by doing these silly optimization, real world users will > > push back for breaking their existing code. > > > > I'm fairly sure the GCC people _will_ get shouted at _loudly_ when they > > break the kernel by doing crazy shit like that. > > > > Given its near impossible to write a correct program in C/C++ and > > tagging the entire kernel with __atomic is equally not going to happen, > > I think we must find a practical solution. > > > > Either that, or we really need to consider forking the language and > > compiler :-( > > Depends on how much benefit the optimizations provide. If they provide > little or no benefit, I am with you, otherwise we will need to bit some > bullet or another. Keep in mind that there is a lot of code in the > kernel that runs sequentially (e.g., due to being fully protected by > locks), and aggressive optimizations for that sort of code are harmless. > > Can't say I know the answer at the moment, though. > > Thanx, Paul _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev