Richard, Sorry for my previous message - I did not undersatnd it properly.
Anyway I proposed another fix that avoid (type) x & c --> (type) (x & (type-x) c) transformation if x has short type: +++ gcc/tree-ssa-forwprop.c (working copy) @@ -1789,8 +1789,11 @@ defcodefor_name (arg1, &def1_code, &def1_arg1, &def1_arg2); defcodefor_name (arg2, &def2_code, &def2_arg1, &def2_arg2); - /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)). */ + /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)). + Do that only if X has not short type. */ if (TREE_CODE (arg2) == INTEGER_CST + && (TYPE_PRECISION (TREE_TYPE (arg1)) + >= TYPE_PRECISION (integer_type_node)) && CONVERT_EXPR_CODE_P (def1_code) && INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1)) && int_fits_type_p (arg2, TREE_TYPE (def1_arg1))) Does this fix look suitable? Yuri. 2013/2/21 Richard Biener <richard.guent...@gmail.com>: > On Thu, Feb 21, 2013 at 3:26 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >> Richard, >> >> I double checked that with and without my fix compiler produces the >> same output with -fdump-tree-optimized. > > For what testcase? > > Richard. > >> What patch did you apply? I think that you should apply the second one >> - 56175.diff. >> >> Yuri. >> >> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>> On Thu, Feb 21, 2013 at 2:41 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>>> Richard, >>>> >>>> This regression was introduced by Kai >>>> >>>> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html >>>> >>>> 2011-06-27 Kai Tietz <kti...@redhat.com> >>>> >>>> * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve >>>> type sinking. >>>> * tree-ssa-math-opts.c (execute_optimize_bswap): Separate >>>> search for di/si mode patterns for finding widest match. >>>> >>>> Is it sufficient for you to accept my patch? >>> >>> No, I still don't think the fix is sound. A proper fix would revert the >>> above change (the simplify_bitwise_binary one), watch for testsuite >>> fallout (I can immediately see gcc.dg/tree-ssa/bitwise-sink.c failing) >>> and fix those failures in a better way. >>> >>> Richard. >>> >>>> Best regards. >>>> yuri. >>>> >>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>>>> On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>> wrote: >>>>>> Richard, >>>>>> >>>>>> As we know Kai is working on this problem for 4.9 and I assume that >>>>>> type sinking will be deleted from forwprop pass. Could we stay on this >>>>>> fix but more common fix will be done. >>>>> >>>>> Well, unless you show it is a regression the patch is not applicable for >>>>> 4.8 >>>>> anyway. Not sure if the code will be deleted from forwprop pass in 4.9 >>>>> either, >>>>> it is after all a canonicalization - fold seems to perform the opposite >>>>> one >>>>> though: >>>>> >>>>> /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer >>>>> constants (if x has signed type, the sign bit cannot be set >>>>> in c). This folds extension into the BIT_AND_EXPR. >>>>> >>>>> note that what forwprop does (T)x & c -> (T)(x & c') I'd call type >>>>> hoisting, >>>>> not sinking. Generally frontends and fold try to narrow operands when >>>>> possible (even though some targets later widen them again because of >>>>> instruction set constraints). >>>>> >>>>> Most of this hoisting code was done to make lowering logical && and || >>>>> I believe. Looking at the testcases added tells us that while matching >>>>> the two patterns as done now helps them but only because that pattern >>>>> feeds single-operand instructions that then simplify. So doing the >>>>> transform >>>>> starting from that single-operand instructions instead looks like a better >>>>> fix (op !=/== 0/1 and (T) op) and also would not disagree with the >>>>> canonicalization done by fold. >>>>> >>>>> Richard. >>>>> >>>>>> I also can propose to introduce new hook for it but need to know your >>>>>> opinion since we don't went to waste our time on preparing dead >>>>>> patches. Note that x86 supports all short types in HW and such type >>>>>> sinkning is usually useless if short types are involved. >>>>>> >>>>>> Best regards. >>>>>> Yuri. >>>>>> >>>>>> >>>>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>>>>>> On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>> wrote: >>>>>>>> Richard, >>>>>>>> >>>>>>>> First of all, your proposal to move type sinking to the end of >>>>>>>> function does not work since we handle each statement in function and >>>>>>>> we want that 1st type folding of X & C will not happen. >>>>>>>> Note that we have the following sequence of gimple before forwprop1: >>>>>>>> >>>>>>>> x.0_10 = (signed char) x_8; >>>>>>>> _11 = x.0_10 & 1; >>>>>>>> _12 = (signed char) y_9; >>>>>>>> _13 = _12 & 1; >>>>>>>> _14 = _11 ^ _13; >>>>>>> >>>>>>> Ah, indeed. Reminds me of some of my dead patches that separated >>>>>>> forwprop into a forward and backward stage. Of course then you have >>>>>>> the ordering issue of whether to first forward or backward. >>>>>>> >>>>>>> Which means that I bet you can construct a testcase that with >>>>>>> your change is no longer optimized (just make pushing the conversion >>>>>>> make the types _match_). Which is always the case >>>>>>> with this kind of local pattern-matching transforms. >>>>>>> >>>>>>> Currently forwprop processes leafs of expression trees first (well, >>>>>>> inside >>>>>>> a basic-block), similar to how fold () is supposed to be operated, based >>>>>>> on the idea that simplified / canonicalized leafs helps keeping pattern >>>>>>> recognition simple and cost considerations more accurate. >>>>>>> >>>>>>> When one order works better than another you always have to consider >>>>>>> that the user could already have written the code in a way that results >>>>>>> in the input that isn't well handled. >>>>>>> >>>>>>> Not that this helps very much for the situation ;) >>>>>>> >>>>>>> But I don't like the use of first_pass_instance ... and the fix isn't >>>>>>> an improvement but just a hack for the benchmark. >>>>>>> >>>>>>> Richard. >>>>>>> >>>>>>>> I also added comment to my fix and create new test for it. I also >>>>>>>> checked that this test is passed with patched compiler only. So >>>>>>>> Change Log was also modified: >>>>>>>> >>>>>>>> ChangeLog >>>>>>>> >>>>>>>> 2013-02-20 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>> >>>>>>>> PR tree-optimization/56175 >>>>>>>> * tree-ssa-forwprop.c (simplify_bitwise_binary): Avoid type >>>>>>>> sinking >>>>>>>> at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) >>>>>>>> & C >>>>>>>> for short integer types. >>>>>>>> * gcc.dg/pr56175.c: New test. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 2013/2/20 Richard Biener <richard.guent...@gmail.com>: >>>>>>>>> On Wed, Feb 20, 2013 at 1:00 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>> Hi All, >>>>>>>>>> >>>>>>>>>> This patch is aimed to recognize (A & C) ^ (B & C) -> (A ^ B) & C >>>>>>>>>> pattern in simpify_bitwise_binary for short integer types. >>>>>>>>>> The fix is very simple - we simply turn off short type sinking at the >>>>>>>>>> first pass of forward propagation allows to get >>>>>>>>>> +10% speedup for important benchmark Coremark 1.0 at x86 Atom and >>>>>>>>>> +5-7% for other x86 platforms too. >>>>>>>>>> Bootstrapping and regression testing were successful on x86-64. >>>>>>>>>> >>>>>>>>>> Is it Ok for trunk? >>>>>>>>> >>>>>>>>> It definitely needs a comment before the checks. >>>>>>>>> >>>>>>>>> Also I think it simply shows that the code is placed at the wrong >>>>>>>>> spot. >>>>>>>>> Simply moving it down in simplify_bitwise_binary to be done the very >>>>>>>>> last >>>>>>>>> should get both of the effects done. >>>>>>>>> >>>>>>>>> Can you rework the patch according to that? >>>>>>>>> >>>>>>>>> You also miss a testcase, we should make sure to not regress again >>>>>>>>> here. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Richard. >>>>>>>>> >>>>>>>>>> ChangeLog. >>>>>>>>>> >>>>>>>>>> 2013-02-20 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>> >>>>>>>>>> PR tree-optimization/56175 >>>>>>>>>> * tree-ssa-forwprop.c (simplify_bitwise_binary) : Avoid type >>>>>>>>>> sinking >>>>>>>>>> at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ >>>>>>>>>> B) & C >>>>>>>>>> for short integer types.