Hi Chris, On Thu, Sep 26, 2013 at 06:24:53PM +0100, Chris Metcalf wrote: > It turns out the kernel relies on barrier() to force a reload of the > percpu offset value. Since we can't easily modify the definition of > barrier() to include "tp" as an output register, we instead provide a > definition of __my_cpu_offset as extended assembly that includes a fake > stack read to hazard against barrier(), forcing gcc to know that it > must reread "tp" and recompute anything based on "tp" after a barrier. > > This fixes observed hangs in the slub allocator when we are looping > on a percpu cmpxchg_double. > > A similar fix for ARMv7 was made in June in change 509eb76ebf97. > > Cc: sta...@vger.kernel.org > Signed-off-by: Chris Metcalf <cmetc...@tilera.com> > --- > Ben Herrenschmidt offered to look into extending barrier() in some > per-architecture way, but for now this patch is the minimal thing to > fix the bug in 3.12 and earlier, so I'm planning to push it as-is. > > arch/tile/include/asm/percpu.h | 33 ++++++++++++++++++++++++++++++--- > 1 file changed, 30 insertions(+), 3 deletions(-) > > diff --git a/arch/tile/include/asm/percpu.h b/arch/tile/include/asm/percpu.h > index 63294f5..de121d3 100644 > --- a/arch/tile/include/asm/percpu.h > +++ b/arch/tile/include/asm/percpu.h > @@ -15,9 +15,36 @@ > #ifndef _ASM_TILE_PERCPU_H > #define _ASM_TILE_PERCPU_H > > -register unsigned long __my_cpu_offset __asm__("tp"); > -#define __my_cpu_offset __my_cpu_offset > -#define set_my_cpu_offset(tp) (__my_cpu_offset = (tp)) > +register unsigned long my_cpu_offset_reg asm("tp"); > + > +#ifdef CONFIG_PREEMPT > +/* > + * For full preemption, we can't just use the register variable > + * directly, since we need barrier() to hazard against it, causing the > + * compiler to reload anything computed from a previous "tp" value. > + * But we also don't want to use volatile asm, since we'd like the > + * compiler to be able to cache the value across multiple percpu reads. > + * So we use a fake stack read as a hazard against barrier(). > + */ > +static inline unsigned long __my_cpu_offset(void) > +{ > + unsigned long tp; > + register unsigned long *sp asm("sp"); > + asm("move %0, tp" : "=r" (tp) : "m" (*sp)); > + return tp; > +}
Hehe, nice to see this hack working out for you too. One thing to check is whether you have any funky addressing modes (things like writeback or post-increment), since the "m" constraint can bite you if you don't actually use it in the asm. Will -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/