Hi Richard, Please, prepare the patch as you proposed. I checked that it helps Coremrak 1.0.
Best regards. Yuri. 2013/2/25 Richard Biener <richard.guent...@gmail.com>: > On Thu, Feb 21, 2013 at 4:42 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >> Richard, >> >> This does not fit to my fix since if we have another pattern like >> t = (u8)((x & 1) ^ ((u8)y & 1)); >> where y has short type, for rhs operand type sinkning (or hoisting as >> you prefer) still will be performed but I don't see any reason for >> converting (u8)y & 1 --> (u8)(y & 1) if y has u16 type for all >> targets including x86. > > As I said we shouldn't do that and the tests I suggested would make sure > we don't. Even if I mistyped them. Put in the same condition as in the > (type) X op (type) Y hoisting which is exactly there to avoid widening > the operands to 'op' with the hoisting. > > I can produce a patch later today. > > Richard. > >> Yuri. >> >> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>> On Thu, Feb 21, 2013 at 4:16 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>>> Richard, >>>> >>>> Sorry for my previous message - I did not undersatnd it properly. >>>> >>>> Anyway I proposed another fix that avoid (type) x & c --> (type) (x & >>>> (type-x) c) transformation if x has short type: >>>> >>>> +++ gcc/tree-ssa-forwprop.c (working copy) >>>> @@ -1789,8 +1789,11 @@ >>>> defcodefor_name (arg1, &def1_code, &def1_arg1, &def1_arg2); >>>> defcodefor_name (arg2, &def2_code, &def2_arg1, &def2_arg2); >>>> >>>> - /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)). */ >>>> + /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)). >>>> + Do that only if X has not short type. */ >>>> if (TREE_CODE (arg2) == INTEGER_CST >>>> + && (TYPE_PRECISION (TREE_TYPE (arg1)) >>>> + >= TYPE_PRECISION (integer_type_node)) >>>> && CONVERT_EXPR_CODE_P (def1_code) >>>> && INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1)) >>>> && int_fits_type_p (arg2, TREE_TYPE (def1_arg1))) >>>> >>>> Does this fix look suitable? >>> >>> I think the fix is to disable the transform if it widens the operation. >>> Thus, sth like >>> >>> if (TREE_CODE (arg2) == INTEGER_CST >>> && CONVERT_EXPR_CODE_P (def1_code) >>> && INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1)) >>> + && (TYPE_PRECISION (TREE_TYPE (arg1)) >>> + >= TYPE_PRECISION (TREE_TYPE (def1_arg1))) >>> && int_fits_type_p (arg2, TREE_TYPE (def1_arg1))) >>> >>> see the restriction we place on the transform for the (T1) x & (T2) y case: >>> >>> /* For bitwise binary operations apply operand conversions to the >>> binary operation result instead of to the operands. This allows >>> to combine successive conversions and bitwise binary operations. */ >>> if (CONVERT_EXPR_CODE_P (def1_code) >>> && CONVERT_EXPR_CODE_P (def2_code) >>> && types_compatible_p (TREE_TYPE (def1_arg1), TREE_TYPE (def2_arg1)) >>> /* Make sure that the conversion widens the operands, or has same >>> precision, or that it changes the operation to a bitfield >>> precision. */ >>> && ((TYPE_PRECISION (TREE_TYPE (def1_arg1)) >>> <= TYPE_PRECISION (TREE_TYPE (arg1))) >>> || (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (arg1))) >>> != MODE_INT) >>> || (TYPE_PRECISION (TREE_TYPE (arg1)) >>> != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (arg1)))))) >>> >>> ideally you'd split out that condition into a predicate function taking the >>> two type and use it in both places. >>> >>> Richard. >>> >>>> Yuri. >>>> >>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>>>> On Thu, Feb 21, 2013 at 3:26 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>> wrote: >>>>>> Richard, >>>>>> >>>>>> I double checked that with and without my fix compiler produces the >>>>>> same output with -fdump-tree-optimized. >>>>> >>>>> For what testcase? >>>>> >>>>> Richard. >>>>> >>>>>> What patch did you apply? I think that you should apply the second one >>>>>> - 56175.diff. >>>>>> >>>>>> Yuri. >>>>>> >>>>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>>>>>> On Thu, Feb 21, 2013 at 2:41 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>> wrote: >>>>>>>> Richard, >>>>>>>> >>>>>>>> This regression was introduced by Kai >>>>>>>> >>>>>>>> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html >>>>>>>> >>>>>>>> 2011-06-27 Kai Tietz <kti...@redhat.com> >>>>>>>> >>>>>>>> * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve >>>>>>>> type sinking. >>>>>>>> * tree-ssa-math-opts.c (execute_optimize_bswap): Separate >>>>>>>> search for di/si mode patterns for finding widest match. >>>>>>>> >>>>>>>> Is it sufficient for you to accept my patch? >>>>>>> >>>>>>> No, I still don't think the fix is sound. A proper fix would revert the >>>>>>> above change (the simplify_bitwise_binary one), watch for testsuite >>>>>>> fallout (I can immediately see gcc.dg/tree-ssa/bitwise-sink.c failing) >>>>>>> and fix those failures in a better way. >>>>>>> >>>>>>> Richard. >>>>>>> >>>>>>>> Best regards. >>>>>>>> yuri. >>>>>>>> >>>>>>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>>>>>>>> On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>> Richard, >>>>>>>>>> >>>>>>>>>> As we know Kai is working on this problem for 4.9 and I assume that >>>>>>>>>> type sinking will be deleted from forwprop pass. Could we stay on >>>>>>>>>> this >>>>>>>>>> fix but more common fix will be done. >>>>>>>>> >>>>>>>>> Well, unless you show it is a regression the patch is not applicable >>>>>>>>> for 4.8 >>>>>>>>> anyway. Not sure if the code will be deleted from forwprop pass in >>>>>>>>> 4.9 either, >>>>>>>>> it is after all a canonicalization - fold seems to perform the >>>>>>>>> opposite one >>>>>>>>> though: >>>>>>>>> >>>>>>>>> /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer >>>>>>>>> constants (if x has signed type, the sign bit cannot be set >>>>>>>>> in c). This folds extension into the BIT_AND_EXPR. >>>>>>>>> >>>>>>>>> note that what forwprop does (T)x & c -> (T)(x & c') I'd call type >>>>>>>>> hoisting, >>>>>>>>> not sinking. Generally frontends and fold try to narrow operands when >>>>>>>>> possible (even though some targets later widen them again because of >>>>>>>>> instruction set constraints). >>>>>>>>> >>>>>>>>> Most of this hoisting code was done to make lowering logical && and || >>>>>>>>> I believe. Looking at the testcases added tells us that while >>>>>>>>> matching >>>>>>>>> the two patterns as done now helps them but only because that pattern >>>>>>>>> feeds single-operand instructions that then simplify. So doing the >>>>>>>>> transform >>>>>>>>> starting from that single-operand instructions instead looks like a >>>>>>>>> better >>>>>>>>> fix (op !=/== 0/1 and (T) op) and also would not disagree with the >>>>>>>>> canonicalization done by fold. >>>>>>>>> >>>>>>>>> Richard. >>>>>>>>> >>>>>>>>>> I also can propose to introduce new hook for it but need to know your >>>>>>>>>> opinion since we don't went to waste our time on preparing dead >>>>>>>>>> patches. Note that x86 supports all short types in HW and such type >>>>>>>>>> sinkning is usually useless if short types are involved. >>>>>>>>>> >>>>>>>>>> Best regards. >>>>>>>>>> Yuri. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>>>>>>>>>> On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev >>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>> Richard, >>>>>>>>>>>> >>>>>>>>>>>> First of all, your proposal to move type sinking to the end of >>>>>>>>>>>> function does not work since we handle each statement in function >>>>>>>>>>>> and >>>>>>>>>>>> we want that 1st type folding of X & C will not happen. >>>>>>>>>>>> Note that we have the following sequence of gimple before >>>>>>>>>>>> forwprop1: >>>>>>>>>>>> >>>>>>>>>>>> x.0_10 = (signed char) x_8; >>>>>>>>>>>> _11 = x.0_10 & 1; >>>>>>>>>>>> _12 = (signed char) y_9; >>>>>>>>>>>> _13 = _12 & 1; >>>>>>>>>>>> _14 = _11 ^ _13; >>>>>>>>>>> >>>>>>>>>>> Ah, indeed. Reminds me of some of my dead patches that separated >>>>>>>>>>> forwprop into a forward and backward stage. Of course then you have >>>>>>>>>>> the ordering issue of whether to first forward or backward. >>>>>>>>>>> >>>>>>>>>>> Which means that I bet you can construct a testcase that with >>>>>>>>>>> your change is no longer optimized (just make pushing the conversion >>>>>>>>>>> make the types _match_). Which is always the case >>>>>>>>>>> with this kind of local pattern-matching transforms. >>>>>>>>>>> >>>>>>>>>>> Currently forwprop processes leafs of expression trees first (well, >>>>>>>>>>> inside >>>>>>>>>>> a basic-block), similar to how fold () is supposed to be operated, >>>>>>>>>>> based >>>>>>>>>>> on the idea that simplified / canonicalized leafs helps keeping >>>>>>>>>>> pattern >>>>>>>>>>> recognition simple and cost considerations more accurate. >>>>>>>>>>> >>>>>>>>>>> When one order works better than another you always have to consider >>>>>>>>>>> that the user could already have written the code in a way that >>>>>>>>>>> results >>>>>>>>>>> in the input that isn't well handled. >>>>>>>>>>> >>>>>>>>>>> Not that this helps very much for the situation ;) >>>>>>>>>>> >>>>>>>>>>> But I don't like the use of first_pass_instance ... and the fix >>>>>>>>>>> isn't >>>>>>>>>>> an improvement but just a hack for the benchmark. >>>>>>>>>>> >>>>>>>>>>> Richard. >>>>>>>>>>> >>>>>>>>>>>> I also added comment to my fix and create new test for it. I also >>>>>>>>>>>> checked that this test is passed with patched compiler only. So >>>>>>>>>>>> Change Log was also modified: >>>>>>>>>>>> >>>>>>>>>>>> ChangeLog >>>>>>>>>>>> >>>>>>>>>>>> 2013-02-20 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>> >>>>>>>>>>>> PR tree-optimization/56175 >>>>>>>>>>>> * tree-ssa-forwprop.c (simplify_bitwise_binary): Avoid >>>>>>>>>>>> type sinking >>>>>>>>>>>> at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A >>>>>>>>>>>> ^ B) & C >>>>>>>>>>>> for short integer types. >>>>>>>>>>>> * gcc.dg/pr56175.c: New test. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 2013/2/20 Richard Biener <richard.guent...@gmail.com>: >>>>>>>>>>>>> On Wed, Feb 20, 2013 at 1:00 PM, Yuri Rumyantsev >>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>> >>>>>>>>>>>>>> This patch is aimed to recognize (A & C) ^ (B & C) -> (A ^ B) & C >>>>>>>>>>>>>> pattern in simpify_bitwise_binary for short integer types. >>>>>>>>>>>>>> The fix is very simple - we simply turn off short type sinking >>>>>>>>>>>>>> at the >>>>>>>>>>>>>> first pass of forward propagation allows to get >>>>>>>>>>>>>> +10% speedup for important benchmark Coremark 1.0 at x86 Atom and >>>>>>>>>>>>>> +5-7% for other x86 platforms too. >>>>>>>>>>>>>> Bootstrapping and regression testing were successful on x86-64. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is it Ok for trunk? >>>>>>>>>>>>> >>>>>>>>>>>>> It definitely needs a comment before the checks. >>>>>>>>>>>>> >>>>>>>>>>>>> Also I think it simply shows that the code is placed at the wrong >>>>>>>>>>>>> spot. >>>>>>>>>>>>> Simply moving it down in simplify_bitwise_binary to be done the >>>>>>>>>>>>> very last >>>>>>>>>>>>> should get both of the effects done. >>>>>>>>>>>>> >>>>>>>>>>>>> Can you rework the patch according to that? >>>>>>>>>>>>> >>>>>>>>>>>>> You also miss a testcase, we should make sure to not regress >>>>>>>>>>>>> again here. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Richard. >>>>>>>>>>>>> >>>>>>>>>>>>>> ChangeLog. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2013-02-20 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>>>> >>>>>>>>>>>>>> PR tree-optimization/56175 >>>>>>>>>>>>>> * tree-ssa-forwprop.c (simplify_bitwise_binary) : Avoid >>>>>>>>>>>>>> type sinking >>>>>>>>>>>>>> at 1st forwprop pass to recognize (A & C) ^ (B & C) -> >>>>>>>>>>>>>> (A ^ B) & C >>>>>>>>>>>>>> for short integer types.