Thanks for taking the time to look at this Ben, comments inline. Benjamin Herrenschmidt wrote: > On Tue, 2009-05-26 at 14:19 -0400, Geoff Thorpe wrote: >> NOT FOR COMMIT, THIS IS A REQUEST FOR FEEDBACK. >> >> The bitops.h functions that operate on a single bit in a bitfield are >> implemented by operating on the corresponding word location. In all cases >> the inner logic appears to be valid if the mask being applied has more >> than one bit set, so this patch exposes those inner operations. Indeed, >> set_bits() was already available, but it duplicated code from set_bit() >> (rather than making the latter a wrapper) - it was also missing the >> PPC405_ERR77() workaround and the "volatile" address qualifier present in >> other APIs. This corrects that, and exposes the other multi-bit >> equivalents. >> >> One advantage of these multi-bit forms is that they allow word-sized >> variables to essentially be their own spinlocks. > > Hi ! Sorry for the delay, that was on my "have a look one of these days, > low priority" list for a bit too long :-)
NP, optimal throughput often requires a compromise in latency :-) > >> NB, the same factoring is possible in asm-generic/bitops/[non-]atomic.h. >> I would be happy to provide corresponding patches if this approach is >> deemed appropriate. Feedback would be most welcome. >> >> Signed-off-by: Geoff Thorpe <geoff.tho...@freescale.com> >> --- >> arch/powerpc/include/asm/bitops.h | 111 >> +++++++++++++++++++++++-------------- >> 1 files changed, 69 insertions(+), 42 deletions(-) >> >> diff --git a/arch/powerpc/include/asm/bitops.h >> b/arch/powerpc/include/asm/bitops.h >> index 897eade..72de28c 100644 >> --- a/arch/powerpc/include/asm/bitops.h >> +++ b/arch/powerpc/include/asm/bitops.h >> @@ -56,11 +56,10 @@ >> #define BITOP_WORD(nr) ((nr) / BITS_PER_LONG) >> #define BITOP_LE_SWIZZLE ((BITS_PER_LONG-1) & ~0x7) >> >> -static __inline__ void set_bit(int nr, volatile unsigned long *addr) >> +static __inline__ void set_bits(unsigned long mask, volatile unsigned long >> *_p) >> { >> unsigned long old; >> - unsigned long mask = BITOP_MASK(nr); >> - unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); >> + unsigned long *p = (unsigned long *)_p; >> >> __asm__ __volatile__( >> "1:" PPC_LLARX "%0,0,%3 # set_bit\n" >> @@ -73,11 +72,16 @@ static __inline__ void set_bit(int nr, volatile unsigned >> long *addr) >> : "cc" ); >> } >> >> -static __inline__ void clear_bit(int nr, volatile unsigned long *addr) >> +static __inline__ void set_bit(int nr, volatile unsigned long *addr) >> +{ >> + set_bits(BITOP_MASK(nr), addr + BITOP_WORD(nr)); >> +} > > No objection with the above. > >> +static __inline__ void clear_bits(unsigned long mask, >> + volatile unsigned long *_p) >> { >> unsigned long old; >> - unsigned long mask = BITOP_MASK(nr); >> - unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); >> + unsigned long *p = (unsigned long *)_p; >> >> __asm__ __volatile__( >> "1:" PPC_LLARX "%0,0,%3 # clear_bit\n" >> @@ -90,11 +94,16 @@ static __inline__ void clear_bit(int nr, volatile >> unsigned long *addr) >> : "cc" ); >> } >> >> -static __inline__ void clear_bit_unlock(int nr, volatile unsigned long >> *addr) >> +static __inline__ void clear_bit(int nr, volatile unsigned long *addr) >> +{ >> + clear_bits(BITOP_MASK(nr), addr + BITOP_WORD(nr)); >> +} > > Looks good too. > >> +static __inline__ void clear_bits_unlock(unsigned long mask, >> + volatile unsigned long *_p) >> { >> unsigned long old; >> - unsigned long mask = BITOP_MASK(nr); >> - unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); >> + unsigned long *p = (unsigned long *)_p; >> >> __asm__ __volatile__( >> LWSYNC_ON_SMP >> @@ -108,11 +117,16 @@ static __inline__ void clear_bit_unlock(int nr, >> volatile unsigned long *addr) >> : "cc", "memory"); >> } >> >> -static __inline__ void change_bit(int nr, volatile unsigned long *addr) >> +static __inline__ void clear_bit_unlock(int nr, volatile unsigned long >> *addr) >> +{ >> + clear_bits_unlock(BITOP_MASK(nr), addr + BITOP_WORD(nr)); >> +} > > I'm not sure it's useful to provide a multi-bit variant of the > "lock" and "unlock" primitives. Do other archs do ? For clear_bit_unlock(), no they don't appear to. There is a fallback in include/asm-generic though, and I notice that it's used in a few places, eg. drivers/rtc/rtc-dev. OTOH some other archs appear to provide their own test_and_set_bit_lock(), and there's a fallback in includes/asm-generic for that too. Do you see a reason to isolate either of these and not factor out the inner word-based logic? > >> +static __inline__ void change_bits(unsigned long mask, >> + volatile unsigned long *_p) >> { >> unsigned long old; >> - unsigned long mask = BITOP_MASK(nr); >> - unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr); >> + unsigned long *p = (unsigned long *)_p; >> >> __asm__ __volatile__( >> "1:" PPC_LLARX "%0,0,%3 # change_bit\n" >> @@ -125,12 +139,16 @@ static __inline__ void change_bit(int nr, volatile >> unsigned long *addr) >> : "cc" ); >> } >> >> -static __inline__ int test_and_set_bit(unsigned long nr, >> - volatile unsigned long *addr) >> +static __inline__ void change_bit(int nr, volatile unsigned long *addr) >> +{ >> + change_bits(BITOP_MASK(nr), addr + BITOP_WORD(nr)); >> +} > > Ah, patch is getting confused between change_bit and > test_and_set_bit :-) The diff machinery off-by-one'd the entire file, which makes the patch a little more difficult to review. But I didn't feel like (further) massacring the code in order to improve the diff. :-) > > Now, you know what I'm thinking is ... Those are all the same except > for: > > - Barriers for lock and unlock variants > - Barriers for "return" (aka test_and_set) variants > - Actual op done on the mask Yup, believe it or not, I saw this coming but didn't have the guts to try proposing something like it up-front (in particular, I was wary of botching some subtleties in the assembly). > > Maybe we can shrink that file significantly (and avoid the risk for > typos etc...) by generating them all from a macro. > > Something like (typed directly into the mailer :-) > > #define DEFINE_BITOP(op, prefix, postfix) \ > asm volatile ( \ > prefix \ > "1:" PPC_LLARX "%0,0,%3\n" \ > __stringify(op) "%1,%0,%2\n" \ > PPC405_ERR77(0,%3) \ > PPC_STLCX "%1,0,%3\n" \ > "bne- 1b\n" \ > postfix \ > : "=&r" (old), "=&r" (t) > : "r" (mask), "r" (p) > : "cc", "memory") > > and so: > > static inline void set_bits(unsigned long mask, volatile unsigned long *addr) > { > unsigned long old, t; > > DEFINE_BITOP(or, "", ""); > } > > static inline void test_and_set_bits(unsigned long mask, volatile unsigned > long *addr) > { > unsigned long old, t; > > DEFINE_BITOP(or, LWSYNC_ON_SMP, ISYNC_ON_SMP); > > return (old & mask) != 0; > } > > etc... Sounds good, I'll try working this up and I'll send a new patch shortly. So can I assume implicitly that changing the set_bits() function to add the 'volatile' qualifier to the prototype (and the missing PPC405_ERR77() workaround) was OK? Also - any opinion on whether the same re-factoring of the asm-generic versions should be undertaken? I'm not looking to bite off more than I can chew, but I don't know if it's frowned upon to make powerpc-only extensions to the API. And if you think an asm-generic patch makes sense, could that be taken via linuxppc-dev too or does it need to go elsewhere? Thanks again, Geoff _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev