On 08/14/2018 11:01 AM, Vlad Lazar wrote: > On 13/08/18 15:00, Richard Sandiford wrote: >> Vlad Lazar <vlad.la...@arm.com> writes: >>> diff --git a/gcc/expmed.h b/gcc/expmed.h >>> index >>> 2890d9c9bbd034f01030dd551d544bf73e73b784..86a32a643fdd0fc9f396bd2c7904244bd484df16 >>> 100644 >>> --- a/gcc/expmed.h >>> +++ b/gcc/expmed.h >>> @@ -702,6 +702,10 @@ extern rtx emit_store_flag (rtx, enum rtx_code, >>> rtx, rtx, machine_mode, >>> extern rtx emit_store_flag_force (rtx, enum rtx_code, rtx, rtx, >>> machine_mode, int, int); >>> +extern void canonicalize_comparison (machine_mode, enum rtx_code >>> *, rtx *); >>> + >>> +extern enum rtx_code canonicalized_cmp_code (enum rtx_code); >> >> It would probably be better to make the second function static (local >> to expmed.c), since it's only used internally by canonicalize_comparison. >> Switching the order of the two functions in expmed.c would avoid the need >> for a forward declaration. >> >>> @@ -6153,6 +6156,97 @@ emit_store_flag_force (rtx target, enum >>> rtx_code code, rtx op0, rtx op1, >>> return target; >>> } >>> + >>> +/* Choose the more appropiate immediate in comparisons. The purpose >>> of this >> >> Maybe worth saying "scalar integer comparisons", since the function >> doesn't handle vector or floating-point comparisons. >> >>> + is to end up with an immediate which can be loaded into a >>> register in fewer >>> + moves, if possible. >>> + >>> + For each integer comparison there exists an equivalent choice: >>> + i) a > b or a >= b + 1 >>> + ii) a <= b or a < b + 1 >>> + iii) a >= b or a > b - 1 >>> + iv) a < b or a <= b - 1 >>> + >>> + The function is called in the gimple expanders which handle >>> GIMPLE_ASSIGN >>> + and GIMPLE_COND. It assumes that the first operand of the >>> comparison is a >>> + register and the second is a constant. >> >> This last paragraph doesn't seem accurate any more. Probably best to >> drop it, since there's no real need to say who the callers are if we >> describe the interface. >> >>> + mode is the mode of the first operand. >>> + code points to the comparison code. >>> + imm points to the rtx containing the immediate. */ >> >> GCC's convention is to put parameter names in caps in comments, >> so "MODE is...", etc. >> >> On the IMM line, it might be worth adding "*IMM must satisfy >> CONST_SCALAR_INT_P on entry and continues to satisfy CONST_SCALAR_INT_P >> on exit." >> >>> +void canonicalize_comparison (machine_mode mode, enum rtx_code >>> *code, rtx *imm) >>> +{ >>> + int to_add = 0; >>> + >>> + if (!SCALAR_INT_MODE_P (mode)) >>> + return; >>> + >>> + /* Extract the immediate value from the rtx. */ >>> + wide_int imm_val = wi::shwi (INTVAL (*imm), mode); >> >> This should be: >> >> rtx_mode_t imm_val (*imm, mode); >> >> so that it copes with all CONST_SCALAR_INT_P, not just CONST_INT. >> >>> + if (*code == GT || *code == GTU || *code == LE || *code == LEU) >>> + to_add = 1; >>> + >>> + if (*code == GE || *code == GEU || *code == LT || *code == LTU) >>> + to_add = -1; >> >> Might be better to have: >> >> if (*code == GT || *code == GTU || *code == LE || *code == LEU) >> to_add = 1; >> else if (*code == GE || *code == GEU || *code == LT || *code == LTU) >> to_add = -1; >> else >> return; >> >> so that we exit early for other comparisons, rather than doing wasted >> work. >> >>> + /* Check for overflow/underflow in the case of signed values and >>> + wrapping around in the case of unsigned values. If any occur >>> + cancel the optimization. */ >>> + wide_int max_val = wi::max_value (mode, SIGNED); >>> + wide_int min_val = wi::min_value (mode, SIGNED); >>> + if ((wi::cmps (imm_val, max_val) == 0 && to_add == 1) >>> + || (wi::cmps (imm_val, min_val) == 0 && to_add == -1)) >>> + return; >> >> This needs to use the SIGNED/UNSIGNED choice appropriate for the >> comparison (SIGNED for GT, UNSIGNED for GTU, etc.). There doesn't >> seem to be an existing function that says whether an rtx comparison >> operation is signed or not (bit of a surprise), but there is >> unsigned_condition, which returns the unsigned form of an rtx >> comparison operator. It might be worth adding something like: >> >> /* Return true if integer comparison operator CODE interprets its >> operands >> as unsigned. */ >> >> inline bool >> unsigned_condition_p (rtx_code code) >> { >> return unsigned_condition (code) == code; >> } >> >> and using that to choose between SIGNED and UNSIGNED. >> >> Rather than using wi::cmp, a more direct way of checking for overflow >> is to change this: >> >>> + /* Generate a mov instruction for both cases and see whether the >>> change >>> + was profitable. */ >>> + wide_int imm_modif = wi::add (imm_val, to_add); >> >> to use the overflow form of wi::add, i.e.: >> >> wide_int imm_modif = wi::add (imm_val, to_add, sign, &overflow); >> >> and return if "overflow" is set. >> >>> + >>> + rtx reg = gen_reg_rtx (mode); >> >> gen_reg_rtx creates a new pseudo register that essentially stays >> around until we've finished compiling the function. It's better to use: >> >> gen_rtx_REG (mode, LAST_VIRTUAL_REGISTER + 1); >> >> for cost calculations, so that we don't waste pseudo register numbers. >> >>> + rtx new_imm = GEN_INT (imm_modif.to_shwi ()); >> >> This should be: >> >> rtx new_imm = immed_wide_int_const (imm_modif, mode); >> >> to cope with non-CONST_INT immediates, and to ensure that CONST_INT >> immediates are properly sign-extended. >> >> (The rtx interfaces are a bit clunky, sorry.) >> >> Patch looks good to me with those changes, but someone else will >> need to approve. >> >> Thanks, >> Richard >> > > Thanks for taking the time to point everything out. > > I've updated the patch according to Richard's comments. > See the updated patch below. OK for trunk? > > Thanks, > Vlad > > --- > > This patch optimises the choice of immediates in integer comparisons. > Integer > comparisons allow for different choices (e.g. a > b is equivalent to a >>= b+1) > and there are cases where an incremented/decremented immediate can be > loaded into a > register in fewer instructions. The cases are as follows: > > i) a > b or a >= b + 1 > ii) a <= b or a < b + 1 > iii) a >= b or a > b - 1 > iv) a < b or a <= b - 1 > > For each comparison we check whether the equivalent can be performed in > less instructions. > This is done in the gimple expanders. Therefore, it requires a correct > implementation of the > TARGET_INSN_COST hook. The change is general and it applies to any > integer comparison, regardless > of types or location. > > For example, on AArch64 for the following code: > > int foo (int x) > { > return x > 0xfefffffe; > } > > it generates: > > mov w1, -16777217 > cmp w0, w1 > cset w0, cs > > instead of: > > mov w1, 65534 > movk w1, 0xfeff, lsl 16 > cmp w0, w1 > cset w0, hi > > Bootstrapped and regtested on aarch64-none-linux-gnu, > x86_64-pc-linux-gnu and there are no regressions. > > Thanks, > Vlad > > gcc/testsuite/ > Changelog for gcc/testsuite/Changelog > 2018-08-14 Vlad Lazar <vlad.la...@arm.com> > > * gcc.target/aarch64/imm_choice_comparison.c: New. > > gcc/ > Changelog for gcc/Changelog > 2018-08-14 Vlad Lazar <vlad.la...@arm.com> > * expmed.h (canonicalize_comparison): New declaration. > * expmed.c (canonicalize_comparison, equivalent_cmp_code): New. > * expmed.c (emit_store_flag_1): Add call to canonicalize_comparison. > * optabs.c (prepare_cmp_insn): Likewise. > * rtl.h (unsigned_condition_p): New function which checks if a > comparison operator is unsigned. Thanks. I fixed up the ChangeLog entry and installed this on the trunk.
Richard S -- thanks for working with Vlad to push this forward. jeff