Excerpts from Segher Boessenkool's message of April 14, 2021 7:58 am: > On Tue, Apr 13, 2021 at 06:33:19PM +0200, Christophe Leroy wrote: >> Le 12/04/2021 à 23:54, Segher Boessenkool a écrit : >> >On Thu, Apr 08, 2021 at 03:33:44PM +0000, Christophe Leroy wrote: >> >>For clear bits, on 32 bits 'rlwinm' can be used instead or 'andc' for >> >>when all bits to be cleared are consecutive. >> > >> >Also on 64-bits, as long as both the top and bottom bits are in the low >> >32-bit half (for 32 bit mode, it can wrap as well). >> >> Yes. But here we are talking about clearing a few bits, all other ones must >> remain unchanged. An rlwinm on PPC64 will always clear the upper part, >> which is unlikely what we want. > > No, it does not. It takes the low 32 bits of the source reg, duplicated > to the top half as well, then rotated, then ANDed with the mask (which > can wrap around). This isn't very often very useful, but :-) > > (One useful operation is splatting 32 bits to both halves of a 64-bit > register, which is just rlwinm d,s,0,1,0). > > If you only look at the low 32 bits, it does exactly the same as on > 32-bit implementations. > >> >>For the time being only >> >>handle the single bit case, which we detect by checking whether the >> >>mask is a power of two. >> > >> >You could look at rs6000_is_valid_mask in GCC: >> > >> > <https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/rs6000/rs6000.c;h=48b8efd732b251c059628096314848305deb0c0b;hb=HEAD#l11148> >> >used by rs6000_is_valid_and_mask immediately after it. You probably >> >want to allow only rlwinm in your case, and please note this checks if >> >something is a valid mask, not the inverse of a valid mask (as you >> >want here). >> >> This check looks more complex than what I need. It is used for both rlw... >> and rld..., and it calculates the operants. The only thing I need is to >> validate the mask. > > It has to do exactly the same thing for rlwinm as for all 64-bit > variants (rldicl, rldicr, rldic). > > One side effect of calculation the bit positions with exact_log2 is that > that returns negative if the argument is not a power of two. > > Here is a simpler way, that handles all cases: input in "u32 val": > > if (!val) > return nonono; > if (val & 1) > val = ~val; // make the mask non-wrapping > val += val & -val; // adding the low set bit should result in > // at most one bit set > if (!(val & (val - 1))) > return okidoki_all_good; > >> I found a way: By anding the mask with the complement of itself rotated by >> left bits to 1, we identify the transitions from 0 to 1. If the result is a >> power of 2, it means there's only one transition so the mask is as expected. > > That does not handle all cases (it misses all bits set at least). Which > isn't all that interesting of course, but is a valid mask (but won't > clear any bits, so not too interesting for your specific case :-) )
Would be nice if we could let the compiler deal with it all... static inline unsigned long lr(unsigned long *mem) { unsigned long val; /* * This doesn't clobber memory but want to avoid memory operations * moving ahead of it */ asm volatile("ldarx %0, %y1" : "=r"(val) : "Z"(*mem) : "memory"); return val; } static inline bool stc(unsigned long *mem, unsigned long val) { /* * This doesn't really clobber memory but same as above, also can't * specify output in asm goto. */ asm volatile goto( "stdcx. %0, %y1 \n\t" "bne- %l[fail] \n\t" : : "r"(val), "Z"(*mem) : "cr0", "memory" : fail); return true; fail: __attribute__((cold)) return false; } static inline void atomic_add(unsigned long *mem, unsigned long val) { unsigned long old, new; do { old = lr(mem); new = old + val; } while (unlikely(!stc(mem, new))); }