On 08/04/14 02:24, Zhenqiang Chen wrote:
ChangeLog:
2014-05-22 Zhenqiang Chen <zhenqiang.c...@linaro.org>
Part of PR rtl-optimization/61225
* config/i386/i386-protos.h (ix86_peephole2_rtx_equal_p): New
proto.
* config/i386/i386.c (ix86_peephole2_rtx_equal_p): New function.
* regcprop.c (replace_oldest_value_reg): Add REG_EQUAL note when
propagating to SET.
I can't help but wonder why the new 4 insn combination code isn't presenting
this as a nice big fat insn to the x86 backend which would eliminate the
need for the peep2.
But, assuming there's a fundamental reason why that's not kicking in...
Current combine pass can only handle
I0 -> I1 -> I2 -> I3.
I0, I1 -> I2, I2 -> I3.
I0 -> I2; I1, I2 -> I3.
I0 -> I1; I1, I2 -> I3.
For the case, it is
I1 -> I2 -> I3; I2 -> INSN
I3 and INSN looks like not related. But INSN is a COMPARE to set CC
and I3 can also set CC. I3 and INSN can be combined together as one
instruction to set CC.
Presumably there's no dataflow between I3 and INSN because they both set
CC (doesn't that make them anti-dependent?
Can you show me the RTL corresponding to I1, I2, I3 and INSN, I simply
find it easier to look at RTL rather than guess why we don't have the
appropriate linkage and thus not attempting the combinations we want.
Pseudo code for the resulting I3 and INSN would help -- as I work
through this there's some inconsistencies in how I'm interpreting a few
things and RTL and pseudo-rtl for the desired output RTL would help a lot.
I
ChangeLog
2014-08-04 Zhenqiang Chen <zhenqiang.c...@linaro.org>
Part of PR rtl-optimization/61225
* combine.c (refer_same_reg_p): New function.
(combine_instructions): Handle I1 -> I2 -> I3; I2 -> insn.
(try_combine): Add one more parameter TO_COMBINED_INSN, which is
used to create a new insn parallel (TO_COMBINED_INSN, I3).
testsuite/ChangeLog:
2014-08-04 Zhenqiang Chen <zhenqiang.c...@linaro.org>
* gcc.target/i386/pr61225.c: New test.
diff --git a/gcc/combine.c b/gcc/combine.c
index 53ac1d6..42098ab 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -412,7 +412,7 @@ static int cant_combine_insn_p (rtx);
static int can_combine_p (rtx, rtx, rtx, rtx, rtx, rtx, rtx *, rtx *);
static int combinable_i3pat (rtx, rtx *, rtx, rtx, rtx, int, int, rtx *);
static int contains_muldiv (rtx);
-static rtx try_combine (rtx, rtx, rtx, rtx, int *, rtx);
+static rtx try_combine (rtx, rtx, rtx, rtx, int *, rtx, rtx);
static void undo_all (void);
static void undo_commit (void);
static rtx *find_split_point (rtx *, rtx, bool);
@@ -1099,6 +1099,46 @@ insn_a_feeds_b (rtx a, rtx b)
#endif
return false;
}
+
+/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2.
+ It returns TRUE, if reg1 == reg2, and no other refer of reg1
+ except A and B. */
+
+static bool
+refer_same_reg_p (rtx a, rtx b)
+{
+ rtx seta = single_set (a);
+ rtx setb = single_set (b);
+
+ if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b)
+ || !seta || !setb)
+ return false;
Go ahead and use
|| !setb
|| !setb
It's a bit more vertical space, but I believe closer in line with our
coding standards.
+
+ if (GET_CODE (SET_SRC (seta)) != COMPARE
+ || GET_MODE_CLASS (GET_MODE (SET_DEST (seta))) != MODE_CC
+ || !REG_P (XEXP (SET_SRC (seta), 0))
+ || !const0_rtx
+ || !REG_P (SET_SRC (setb))
+ || REGNO (SET_SRC (setb)) != REGNO (XEXP (SET_SRC (seta), 0)))
+ return false;
What's the !const0_rtx test here? Don't you want to test some object
from SETA against const0_rtx? Also note that you may need to test
against CONST0_RTX (mode)
@@ -1431,6 +1468,50 @@ combine_instructions (rtx f, unsigned int nregs)
}
}
+ /* Try to combine a compare insn that sets CC
+ with a preceding insn that can set CC, and maybe with its
+ logical predecessor as well.
+ We need this special code because data flow connections
+ do not get entered in LOG_LINKS. */
So you'd want to be more specific about what dataflow connections are
not in the LOG_LINKS that we want.
It feels to me like we're missing the anti-dependence links on CC and
that there's a general aspect to combine missing here. But I want to
hold off on final judgement until I know more.
I also wonder if compare-elim ought to be helping here. Isn't that the
point here, to eliminate the comparison and instead get it for free as
part of the arithmetic? If so, is it the fact that we have memory
references that prevents compare-elim from kicking in?
jeff