On Fri, Jul 24, 2015 at 3:55 AM, Kyrill Tkachov <kyrylo.tkac...@arm.com> wrote: > Hi all, > > This patch implements an aarch64-specific expansion of the signed modulo by > a power of 2. > The proposed sequence makes use of the conditional negate instruction CSNEG. > For a power of N, x % N can be calculated with: > negs x1, x0 > and x0, x0, #(N - 1) > and x1, x1, #(N - 1) > csneg x0, x0, x1, mi > > So, for N == 256 this would be: > negs x1, x0 > and x0, x0, #255 > and x1, x1, #255 > csneg x0, x0, x1, mi > > For comparison, the existing sequence emitted by expand_smod_pow2 in > expmed.c is: > asr x1, x0, 63 > lsr x1, x1, 56 > add x0, x0, x1 > and x0, x0, 255 > sub x0, x0, x1 > > Note that the CSNEG sequence is one instruction shorter and that the two and > operations > are independent, compared to the existing sequence where all instructions > are dependent > on the preceeding instructions.
Just FYI. For ThunderX, this is a size win and a performance win at least in a microbenchmark. > > For the special case of N == 2 we can do even better: > cmp x0, xzr > and x0, x0, 1 > csneg x0, x0, x0, ge This is a size win and a performance win on ThunderX. > > I first tried implementing this in the generic code in expmed.c but that > didn't work > out for a few reasons: > > * This relies on having a conditional-negate instruction. We could gate it > on > HAVE_conditional_move and the combiner is capable of merging the final > negate into > the conditional move if a conditional negate is available (like on aarch64) > but on > targets without a conditional negate this would end up emitting a separate > negate. > > * The first negs has to be a negs for the sequence to be a win i.e. having a > separate > negate and compare makes the sequence slower than the existing one (at least > in my > microbenchmarking) and I couldn't get subsequent passes to combine the > negate and combine > into the negs (presumably due to the use of the negated result in one of the > ands). > Doing it in the aarch64 backend where I could just call the exact gen_* > functions that > I need worked much more cleanly. I agree this does make it harder to implement in a target generic way. Thanks, Andrew > > The costing logic is updated to reflect this sequence during the > intialisation of > expmed.c where it calculates the smod_pow2_cheap metric. > > The tests will come in patch 3 of the series which are partly shared with > the equivalent > arm implementation. > > Bootstrapped and tested on aarch64. > Ok for trunk? > > Thanks, > Kyrill > > 2015-07-24 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > * config/aarch64/aarch64.md (mod<mode>3): New define_expand. > (*neg<mode>2_compare0): Rename to... > (neg<mode>2_compare0): ... This. > * config/aarch64/aarch64.c (aarch64_rtx_costs, MOD case): Reflect > CSNEG sequence in MOD by power of 2 case.