Re: [RFC][PATCH][mid-end] Optimize immediate choice in comparisons.

Vlad Lazar Thu, 16 Aug 2018 09:47:02 -0700

On 16/08/18 17:35, Jeff Law wrote:

On 08/14/2018 11:01 AM, Vlad Lazar wrote:

On 13/08/18 15:00, Richard Sandiford wrote:

Vlad Lazar <vlad.la...@arm.com> writes:

diff --git a/gcc/expmed.h b/gcc/expmed.h
index
2890d9c9bbd034f01030dd551d544bf73e73b784..86a32a643fdd0fc9f396bd2c7904244bd484df16
100644
--- a/gcc/expmed.h
+++ b/gcc/expmed.h
@@ -702,6 +702,10 @@ extern rtx emit_store_flag (rtx, enum rtx_code,
rtx, rtx, machine_mode,
    extern rtx emit_store_flag_force (rtx, enum rtx_code, rtx, rtx,
                      machine_mode, int, int);
    +extern void canonicalize_comparison (machine_mode, enum rtx_code
*, rtx *);
+
+extern enum rtx_code canonicalized_cmp_code (enum rtx_code);


It would probably be better to make the second function static (local
to expmed.c), since it's only used internally by canonicalize_comparison.
Switching the order of the two functions in expmed.c would avoid the need
for a forward declaration.

@@ -6153,6 +6156,97 @@ emit_store_flag_force (rtx target, enum
rtx_code code, rtx op0, rtx op1,
         return target;
    }
+
+/* Choose the more appropiate immediate in comparisons.  The purpose
of this


Maybe worth saying "scalar integer comparisons", since the function
doesn't handle vector or floating-point comparisons.

+   is to end up with an immediate which can be loaded into a
register in fewer
+   moves, if possible.
+
+   For each integer comparison there exists an equivalent choice:
+     i)   a >  b or a >= b + 1
+     ii)  a <= b or a <  b + 1
+     iii) a >= b or a >  b - 1
+     iv)  a <  b or a <= b - 1
+
+   The function is called in the gimple expanders which handle
GIMPLE_ASSIGN
+   and GIMPLE_COND.  It assumes that the first operand of the
comparison is a
+   register and the second is a constant.


This last paragraph doesn't seem accurate any more.  Probably best to
drop it, since there's no real need to say who the callers are if we
describe the interface.

+   mode is the mode of the first operand.
+   code points to the comparison code.
+   imm points to the rtx containing the immediate.  */


GCC's convention is to put parameter names in caps in comments,
so "MODE is...", etc.

On the IMM line, it might be worth adding "*IMM must satisfy
CONST_SCALAR_INT_P on entry and continues to satisfy CONST_SCALAR_INT_P
on exit."

+void canonicalize_comparison (machine_mode mode, enum rtx_code
*code, rtx *imm)
+{
+  int to_add = 0;
+
+  if (!SCALAR_INT_MODE_P (mode))
+    return;
+
+  /* Extract the immediate value from the rtx.  */
+  wide_int imm_val = wi::shwi (INTVAL (*imm), mode);


This should be:

    rtx_mode_t imm_val (*imm, mode);

so that it copes with all CONST_SCALAR_INT_P, not just CONST_INT.

+  if (*code == GT || *code == GTU || *code == LE || *code == LEU)
+    to_add = 1;
+
+  if (*code == GE || *code == GEU || *code == LT || *code == LTU)
+    to_add = -1;


Might be better to have:

    if (*code == GT || *code == GTU || *code == LE || *code == LEU)
      to_add = 1;
    else if (*code == GE || *code == GEU || *code == LT || *code == LTU)
      to_add = -1;
    else
      return;

so that we exit early for other comparisons, rather than doing wasted
work.

+  /* Check for overflow/underflow in the case of signed values and
+     wrapping around in the case of unsigned values.  If any occur
+     cancel the optimization.  */
+  wide_int max_val = wi::max_value (mode, SIGNED);
+  wide_int min_val = wi::min_value (mode, SIGNED);
+  if ((wi::cmps (imm_val, max_val) == 0 && to_add == 1)
+      || (wi::cmps (imm_val, min_val) == 0 && to_add == -1))
+    return;


This needs to use the SIGNED/UNSIGNED choice appropriate for the
comparison (SIGNED for GT, UNSIGNED for GTU, etc.).  There doesn't
seem to be an existing function that says whether an rtx comparison
operation is signed or not (bit of a surprise), but there is
unsigned_condition, which returns the unsigned form of an rtx
comparison operator.  It might be worth adding something like:

    /* Return true if integer comparison operator CODE interprets its
operands
       as unsigned.  */

    inline bool
    unsigned_condition_p (rtx_code code)
    {
      return unsigned_condition (code) == code;
    }

and using that to choose between SIGNED and UNSIGNED.

Rather than using wi::cmp, a more direct way of checking for overflow
is to change this:

+  /* Generate a mov instruction for both cases and see whether the
change
+     was profitable.  */
+  wide_int imm_modif = wi::add (imm_val, to_add);


to use the overflow form of wi::add, i.e.:

    wide_int imm_modif = wi::add (imm_val, to_add, sign, &overflow);

and return if "overflow" is set.

+
+  rtx reg = gen_reg_rtx (mode);


gen_reg_rtx creates a new pseudo register that essentially stays
around until we've finished compiling the function.  It's better to use:

    gen_rtx_REG (mode, LAST_VIRTUAL_REGISTER + 1);

for cost calculations, so that we don't waste pseudo register numbers.

+  rtx new_imm = GEN_INT (imm_modif.to_shwi ());


This should be:

    rtx new_imm = immed_wide_int_const (imm_modif, mode);

to cope with non-CONST_INT immediates, and to ensure that CONST_INT
immediates are properly sign-extended.

(The rtx interfaces are a bit clunky, sorry.)

Patch looks good to me with those changes, but someone else will
need to approve.

Thanks,
Richard


Thanks for taking the time to point everything out.

I've updated the patch according to Richard's comments.
See the updated patch below. OK for trunk?

Thanks,
Vlad

---

This patch optimises the choice of immediates in integer comparisons.
Integer
comparisons allow for different choices (e.g. a > b is equivalent to a

= b+1)

and there are cases where an incremented/decremented immediate can be
loaded into a
register in fewer instructions. The cases are as follows:

i)   a >  b or a >= b + 1
ii)  a <= b or a <  b + 1
iii) a >= b or a >  b - 1
iv)  a <  b or a <= b - 1

For each comparison we check whether the equivalent can be performed in
less instructions.
This is done in the gimple expanders. Therefore, it requires a correct
implementation of the
TARGET_INSN_COST hook. The change is general and it applies to any
integer comparison, regardless
of types or location.

For example, on AArch64 for the following code:

int foo (int x)
{
   return x > 0xfefffffe;
}

it generates:

mov    w1, -16777217
cmp    w0, w1
cset    w0, cs

instead of:

mov    w1, 65534
movk    w1, 0xfeff, lsl 16
cmp    w0, w1
cset    w0, hi

Bootstrapped and regtested on aarch64-none-linux-gnu,
x86_64-pc-linux-gnu and there are no regressions.

Thanks,
Vlad

gcc/testsuite/
Changelog for gcc/testsuite/Changelog
2018-08-14  Vlad Lazar  <vlad.la...@arm.com>

     * gcc.target/aarch64/imm_choice_comparison.c: New.

gcc/
Changelog for gcc/Changelog
2018-08-14  Vlad Lazar  <vlad.la...@arm.com>
     * expmed.h (canonicalize_comparison): New declaration.
     * expmed.c (canonicalize_comparison, equivalent_cmp_code): New.
     * expmed.c (emit_store_flag_1): Add call to canonicalize_comparison.
     * optabs.c (prepare_cmp_insn): Likewise.
     * rtl.h (unsigned_condition_p): New function which checks if a
comparison operator is unsigned.

Thanks.  I fixed up the ChangeLog entry and installed this on the trunk.

Richard S -- thanks for working with Vlad to push this forward.

jeff

Thanks for committing.  Sorry about the ChangeLog.

Vlad

Re: [RFC][PATCH][mid-end] Optimize immediate choice in comparisons.

Reply via email to