On Mon, Feb 25, 2013 at 12:39 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > Hi Richard, > > Please, prepare the patch as you proposed. > I checked that it helps Coremrak 1.0.
The attached works for me (works for the testcase, that is, tree forwprop performs the requested optimization). Richard. > Best regards. > Yuri. > > 2013/2/25 Richard Biener <richard.guent...@gmail.com>: >> On Thu, Feb 21, 2013 at 4:42 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>> Richard, >>> >>> This does not fit to my fix since if we have another pattern like >>> t = (u8)((x & 1) ^ ((u8)y & 1)); >>> where y has short type, for rhs operand type sinkning (or hoisting as >>> you prefer) still will be performed but I don't see any reason for >>> converting (u8)y & 1 --> (u8)(y & 1) if y has u16 type for all >>> targets including x86. >> >> As I said we shouldn't do that and the tests I suggested would make sure >> we don't. Even if I mistyped them. Put in the same condition as in the >> (type) X op (type) Y hoisting which is exactly there to avoid widening >> the operands to 'op' with the hoisting. >> >> I can produce a patch later today. >> >> Richard. >> >>> Yuri. >>> >>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>>> On Thu, Feb 21, 2013 at 4:16 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>> wrote: >>>>> Richard, >>>>> >>>>> Sorry for my previous message - I did not undersatnd it properly. >>>>> >>>>> Anyway I proposed another fix that avoid (type) x & c --> (type) (x & >>>>> (type-x) c) transformation if x has short type: >>>>> >>>>> +++ gcc/tree-ssa-forwprop.c (working copy) >>>>> @@ -1789,8 +1789,11 @@ >>>>> defcodefor_name (arg1, &def1_code, &def1_arg1, &def1_arg2); >>>>> defcodefor_name (arg2, &def2_code, &def2_arg1, &def2_arg2); >>>>> >>>>> - /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)). */ >>>>> + /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)). >>>>> + Do that only if X has not short type. */ >>>>> if (TREE_CODE (arg2) == INTEGER_CST >>>>> + && (TYPE_PRECISION (TREE_TYPE (arg1)) >>>>> + >= TYPE_PRECISION (integer_type_node)) >>>>> && CONVERT_EXPR_CODE_P (def1_code) >>>>> && INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1)) >>>>> && int_fits_type_p (arg2, TREE_TYPE (def1_arg1))) >>>>> >>>>> Does this fix look suitable? >>>> >>>> I think the fix is to disable the transform if it widens the operation. >>>> Thus, sth like >>>> >>>> if (TREE_CODE (arg2) == INTEGER_CST >>>> && CONVERT_EXPR_CODE_P (def1_code) >>>> && INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1)) >>>> + && (TYPE_PRECISION (TREE_TYPE (arg1)) >>>> + >= TYPE_PRECISION (TREE_TYPE (def1_arg1))) >>>> && int_fits_type_p (arg2, TREE_TYPE (def1_arg1))) >>>> >>>> see the restriction we place on the transform for the (T1) x & (T2) y case: >>>> >>>> /* For bitwise binary operations apply operand conversions to the >>>> binary operation result instead of to the operands. This allows >>>> to combine successive conversions and bitwise binary operations. */ >>>> if (CONVERT_EXPR_CODE_P (def1_code) >>>> && CONVERT_EXPR_CODE_P (def2_code) >>>> && types_compatible_p (TREE_TYPE (def1_arg1), TREE_TYPE (def2_arg1)) >>>> /* Make sure that the conversion widens the operands, or has same >>>> precision, or that it changes the operation to a bitfield >>>> precision. */ >>>> && ((TYPE_PRECISION (TREE_TYPE (def1_arg1)) >>>> <= TYPE_PRECISION (TREE_TYPE (arg1))) >>>> || (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (arg1))) >>>> != MODE_INT) >>>> || (TYPE_PRECISION (TREE_TYPE (arg1)) >>>> != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (arg1)))))) >>>> >>>> ideally you'd split out that condition into a predicate function taking the >>>> two type and use it in both places. >>>> >>>> Richard. >>>> >>>>> Yuri. >>>>> >>>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>>>>> On Thu, Feb 21, 2013 at 3:26 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>> wrote: >>>>>>> Richard, >>>>>>> >>>>>>> I double checked that with and without my fix compiler produces the >>>>>>> same output with -fdump-tree-optimized. >>>>>> >>>>>> For what testcase? >>>>>> >>>>>> Richard. >>>>>> >>>>>>> What patch did you apply? I think that you should apply the second one >>>>>>> - 56175.diff. >>>>>>> >>>>>>> Yuri. >>>>>>> >>>>>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>>>>>>> On Thu, Feb 21, 2013 at 2:41 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>> wrote: >>>>>>>>> Richard, >>>>>>>>> >>>>>>>>> This regression was introduced by Kai >>>>>>>>> >>>>>>>>> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html >>>>>>>>> >>>>>>>>> 2011-06-27 Kai Tietz <kti...@redhat.com> >>>>>>>>> >>>>>>>>> * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve >>>>>>>>> type sinking. >>>>>>>>> * tree-ssa-math-opts.c (execute_optimize_bswap): Separate >>>>>>>>> search for di/si mode patterns for finding widest match. >>>>>>>>> >>>>>>>>> Is it sufficient for you to accept my patch? >>>>>>>> >>>>>>>> No, I still don't think the fix is sound. A proper fix would revert >>>>>>>> the >>>>>>>> above change (the simplify_bitwise_binary one), watch for testsuite >>>>>>>> fallout (I can immediately see gcc.dg/tree-ssa/bitwise-sink.c failing) >>>>>>>> and fix those failures in a better way. >>>>>>>> >>>>>>>> Richard. >>>>>>>> >>>>>>>>> Best regards. >>>>>>>>> yuri. >>>>>>>>> >>>>>>>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>>>>>>>>> On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev >>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>> Richard, >>>>>>>>>>> >>>>>>>>>>> As we know Kai is working on this problem for 4.9 and I assume that >>>>>>>>>>> type sinking will be deleted from forwprop pass. Could we stay on >>>>>>>>>>> this >>>>>>>>>>> fix but more common fix will be done. >>>>>>>>>> >>>>>>>>>> Well, unless you show it is a regression the patch is not applicable >>>>>>>>>> for 4.8 >>>>>>>>>> anyway. Not sure if the code will be deleted from forwprop pass in >>>>>>>>>> 4.9 either, >>>>>>>>>> it is after all a canonicalization - fold seems to perform the >>>>>>>>>> opposite one >>>>>>>>>> though: >>>>>>>>>> >>>>>>>>>> /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer >>>>>>>>>> constants (if x has signed type, the sign bit cannot be set >>>>>>>>>> in c). This folds extension into the BIT_AND_EXPR. >>>>>>>>>> >>>>>>>>>> note that what forwprop does (T)x & c -> (T)(x & c') I'd call type >>>>>>>>>> hoisting, >>>>>>>>>> not sinking. Generally frontends and fold try to narrow operands >>>>>>>>>> when >>>>>>>>>> possible (even though some targets later widen them again because of >>>>>>>>>> instruction set constraints). >>>>>>>>>> >>>>>>>>>> Most of this hoisting code was done to make lowering logical && and >>>>>>>>>> || >>>>>>>>>> I believe. Looking at the testcases added tells us that while >>>>>>>>>> matching >>>>>>>>>> the two patterns as done now helps them but only because that pattern >>>>>>>>>> feeds single-operand instructions that then simplify. So doing the >>>>>>>>>> transform >>>>>>>>>> starting from that single-operand instructions instead looks like a >>>>>>>>>> better >>>>>>>>>> fix (op !=/== 0/1 and (T) op) and also would not disagree with the >>>>>>>>>> canonicalization done by fold. >>>>>>>>>> >>>>>>>>>> Richard. >>>>>>>>>> >>>>>>>>>>> I also can propose to introduce new hook for it but need to know >>>>>>>>>>> your >>>>>>>>>>> opinion since we don't went to waste our time on preparing dead >>>>>>>>>>> patches. Note that x86 supports all short types in HW and such type >>>>>>>>>>> sinkning is usually useless if short types are involved. >>>>>>>>>>> >>>>>>>>>>> Best regards. >>>>>>>>>>> Yuri. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>>>>>>>>>>> On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev >>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>> Richard, >>>>>>>>>>>>> >>>>>>>>>>>>> First of all, your proposal to move type sinking to the end of >>>>>>>>>>>>> function does not work since we handle each statement in function >>>>>>>>>>>>> and >>>>>>>>>>>>> we want that 1st type folding of X & C will not happen. >>>>>>>>>>>>> Note that we have the following sequence of gimple before >>>>>>>>>>>>> forwprop1: >>>>>>>>>>>>> >>>>>>>>>>>>> x.0_10 = (signed char) x_8; >>>>>>>>>>>>> _11 = x.0_10 & 1; >>>>>>>>>>>>> _12 = (signed char) y_9; >>>>>>>>>>>>> _13 = _12 & 1; >>>>>>>>>>>>> _14 = _11 ^ _13; >>>>>>>>>>>> >>>>>>>>>>>> Ah, indeed. Reminds me of some of my dead patches that separated >>>>>>>>>>>> forwprop into a forward and backward stage. Of course then you >>>>>>>>>>>> have >>>>>>>>>>>> the ordering issue of whether to first forward or backward. >>>>>>>>>>>> >>>>>>>>>>>> Which means that I bet you can construct a testcase that with >>>>>>>>>>>> your change is no longer optimized (just make pushing the >>>>>>>>>>>> conversion >>>>>>>>>>>> make the types _match_). Which is always the case >>>>>>>>>>>> with this kind of local pattern-matching transforms. >>>>>>>>>>>> >>>>>>>>>>>> Currently forwprop processes leafs of expression trees first >>>>>>>>>>>> (well, inside >>>>>>>>>>>> a basic-block), similar to how fold () is supposed to be operated, >>>>>>>>>>>> based >>>>>>>>>>>> on the idea that simplified / canonicalized leafs helps keeping >>>>>>>>>>>> pattern >>>>>>>>>>>> recognition simple and cost considerations more accurate. >>>>>>>>>>>> >>>>>>>>>>>> When one order works better than another you always have to >>>>>>>>>>>> consider >>>>>>>>>>>> that the user could already have written the code in a way that >>>>>>>>>>>> results >>>>>>>>>>>> in the input that isn't well handled. >>>>>>>>>>>> >>>>>>>>>>>> Not that this helps very much for the situation ;) >>>>>>>>>>>> >>>>>>>>>>>> But I don't like the use of first_pass_instance ... and the fix >>>>>>>>>>>> isn't >>>>>>>>>>>> an improvement but just a hack for the benchmark. >>>>>>>>>>>> >>>>>>>>>>>> Richard. >>>>>>>>>>>> >>>>>>>>>>>>> I also added comment to my fix and create new test for it. I also >>>>>>>>>>>>> checked that this test is passed with patched compiler only. So >>>>>>>>>>>>> Change Log was also modified: >>>>>>>>>>>>> >>>>>>>>>>>>> ChangeLog >>>>>>>>>>>>> >>>>>>>>>>>>> 2013-02-20 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>>> >>>>>>>>>>>>> PR tree-optimization/56175 >>>>>>>>>>>>> * tree-ssa-forwprop.c (simplify_bitwise_binary): Avoid >>>>>>>>>>>>> type sinking >>>>>>>>>>>>> at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A >>>>>>>>>>>>> ^ B) & C >>>>>>>>>>>>> for short integer types. >>>>>>>>>>>>> * gcc.dg/pr56175.c: New test. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> 2013/2/20 Richard Biener <richard.guent...@gmail.com>: >>>>>>>>>>>>>> On Wed, Feb 20, 2013 at 1:00 PM, Yuri Rumyantsev >>>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This patch is aimed to recognize (A & C) ^ (B & C) -> (A ^ B) & >>>>>>>>>>>>>>> C >>>>>>>>>>>>>>> pattern in simpify_bitwise_binary for short integer types. >>>>>>>>>>>>>>> The fix is very simple - we simply turn off short type sinking >>>>>>>>>>>>>>> at the >>>>>>>>>>>>>>> first pass of forward propagation allows to get >>>>>>>>>>>>>>> +10% speedup for important benchmark Coremark 1.0 at x86 Atom >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> +5-7% for other x86 platforms too. >>>>>>>>>>>>>>> Bootstrapping and regression testing were successful on x86-64. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Is it Ok for trunk? >>>>>>>>>>>>>> >>>>>>>>>>>>>> It definitely needs a comment before the checks. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Also I think it simply shows that the code is placed at the >>>>>>>>>>>>>> wrong spot. >>>>>>>>>>>>>> Simply moving it down in simplify_bitwise_binary to be done the >>>>>>>>>>>>>> very last >>>>>>>>>>>>>> should get both of the effects done. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Can you rework the patch according to that? >>>>>>>>>>>>>> >>>>>>>>>>>>>> You also miss a testcase, we should make sure to not regress >>>>>>>>>>>>>> again here. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> ChangeLog. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2013-02-20 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> PR tree-optimization/56175 >>>>>>>>>>>>>>> * tree-ssa-forwprop.c (simplify_bitwise_binary) : Avoid >>>>>>>>>>>>>>> type sinking >>>>>>>>>>>>>>> at 1st forwprop pass to recognize (A & C) ^ (B & C) -> >>>>>>>>>>>>>>> (A ^ B) & C >>>>>>>>>>>>>>> for short integer types.
fix-pr56175
Description: Binary data