On Mon, Feb 25, 2013 at 12:39 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
> Hi Richard,
>
> Please, prepare the patch as you proposed.
> I checked that it helps Coremrak 1.0.

The attached works for me (works for the testcase, that is, tree forwprop
performs the requested optimization).

Richard.

> Best regards.
> Yuri.
>
> 2013/2/25 Richard Biener <richard.guent...@gmail.com>:
>> On Thu, Feb 21, 2013 at 4:42 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>>> Richard,
>>>
>>> This does not fit to my fix  since if we have another pattern like
>>>     t = (u8)((x & 1) ^ ((u8)y & 1));
>>> where y has short type, for rhs operand type sinkning (or hoisting as
>>> you prefer) still will be performed but I don't see any reason for
>>> converting  (u8)y & 1 --> (u8)(y & 1) if y has u16 type for all
>>> targets including x86.
>>
>> As I said we shouldn't do that and the tests I suggested would make sure
>> we don't.  Even if I mistyped them.  Put in the same condition as in the
>> (type) X op (type) Y hoisting which is exactly there to avoid widening
>> the operands to 'op' with the hoisting.
>>
>> I can produce a patch later today.
>>
>> Richard.
>>
>>> Yuri.
>>>
>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>:
>>>> On Thu, Feb 21, 2013 at 4:16 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>> wrote:
>>>>> Richard,
>>>>>
>>>>> Sorry for my previous message - I did not undersatnd it properly.
>>>>>
>>>>> Anyway I proposed another fix that avoid  (type) x & c --> (type) (x &
>>>>> (type-x) c) transformation if x has short type:
>>>>>
>>>>> +++ gcc/tree-ssa-forwprop.c     (working copy)
>>>>> @@ -1789,8 +1789,11 @@
>>>>>    defcodefor_name (arg1, &def1_code, &def1_arg1, &def1_arg2);
>>>>>    defcodefor_name (arg2, &def2_code, &def2_arg1, &def2_arg2);
>>>>>
>>>>> -  /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)).  */
>>>>> +  /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)).
>>>>> +     Do that only if X has not short type.  */
>>>>>    if (TREE_CODE (arg2) == INTEGER_CST
>>>>> +      && (TYPE_PRECISION (TREE_TYPE (arg1))
>>>>> +          >= TYPE_PRECISION (integer_type_node))
>>>>>        && CONVERT_EXPR_CODE_P (def1_code)
>>>>>        && INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1))
>>>>>        && int_fits_type_p (arg2, TREE_TYPE (def1_arg1)))
>>>>>
>>>>> Does this fix look suitable?
>>>>
>>>> I think the fix is to disable the transform if it widens the operation.
>>>> Thus, sth like
>>>>
>>>>     if (TREE_CODE (arg2) == INTEGER_CST
>>>>        && CONVERT_EXPR_CODE_P (def1_code)
>>>>        && INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1))
>>>> +     && (TYPE_PRECISION (TREE_TYPE (arg1))
>>>> +            >= TYPE_PRECISION (TREE_TYPE (def1_arg1)))
>>>>        && int_fits_type_p (arg2, TREE_TYPE (def1_arg1)))
>>>>
>>>> see the restriction we place on the transform for the (T1) x & (T2) y case:
>>>>
>>>>   /* For bitwise binary operations apply operand conversions to the
>>>>      binary operation result instead of to the operands.  This allows
>>>>      to combine successive conversions and bitwise binary operations.  */
>>>>   if (CONVERT_EXPR_CODE_P (def1_code)
>>>>       && CONVERT_EXPR_CODE_P (def2_code)
>>>>       && types_compatible_p (TREE_TYPE (def1_arg1), TREE_TYPE (def2_arg1))
>>>>       /* Make sure that the conversion widens the operands, or has same
>>>>          precision,  or that it changes the operation to a bitfield
>>>>          precision.  */
>>>>       && ((TYPE_PRECISION (TREE_TYPE (def1_arg1))
>>>>            <= TYPE_PRECISION (TREE_TYPE (arg1)))
>>>>           || (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (arg1)))
>>>>               != MODE_INT)
>>>>           || (TYPE_PRECISION (TREE_TYPE (arg1))
>>>>               != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (arg1))))))
>>>>
>>>> ideally you'd split out that condition into a predicate function taking the
>>>> two type and use it in both places.
>>>>
>>>> Richard.
>>>>
>>>>> Yuri.
>>>>>
>>>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>:
>>>>>> On Thu, Feb 21, 2013 at 3:26 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>> wrote:
>>>>>>> Richard,
>>>>>>>
>>>>>>> I double checked that with and without my fix compiler produces the
>>>>>>> same output with -fdump-tree-optimized.
>>>>>>
>>>>>> For what testcase?
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>> What patch did you apply? I think that you should apply the second one
>>>>>>> - 56175.diff.
>>>>>>>
>>>>>>> Yuri.
>>>>>>>
>>>>>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>:
>>>>>>>> On Thu, Feb 21, 2013 at 2:41 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>>>> wrote:
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> This regression was introduced by Kai
>>>>>>>>>
>>>>>>>>> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html
>>>>>>>>>
>>>>>>>>> 2011-06-27  Kai Tietz  <kti...@redhat.com>
>>>>>>>>>
>>>>>>>>>         * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve
>>>>>>>>>         type sinking.
>>>>>>>>>         * tree-ssa-math-opts.c (execute_optimize_bswap): Separate
>>>>>>>>>         search for di/si mode patterns for finding widest match.
>>>>>>>>>
>>>>>>>>> Is it sufficient for you to accept my patch?
>>>>>>>>
>>>>>>>> No, I still don't think the fix is sound.  A proper fix would revert 
>>>>>>>> the
>>>>>>>> above change (the simplify_bitwise_binary one), watch for testsuite
>>>>>>>> fallout (I can immediately see gcc.dg/tree-ssa/bitwise-sink.c failing)
>>>>>>>> and fix those failures in a better way.
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>> Best regards.
>>>>>>>>> yuri.
>>>>>>>>>
>>>>>>>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>:
>>>>>>>>>> On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev 
>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>> Richard,
>>>>>>>>>>>
>>>>>>>>>>> As we know Kai is working on this problem for 4.9 and I assume that
>>>>>>>>>>> type sinking will be deleted from forwprop pass. Could we stay on 
>>>>>>>>>>> this
>>>>>>>>>>> fix but more common fix will be done.
>>>>>>>>>>
>>>>>>>>>> Well, unless you show it is a regression the patch is not applicable 
>>>>>>>>>> for 4.8
>>>>>>>>>> anyway.  Not sure if the code will be deleted from forwprop pass in 
>>>>>>>>>> 4.9 either,
>>>>>>>>>> it is after all a canonicalization - fold seems to perform the 
>>>>>>>>>> opposite one
>>>>>>>>>> though:
>>>>>>>>>>
>>>>>>>>>>       /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer
>>>>>>>>>>          constants (if x has signed type, the sign bit cannot be set
>>>>>>>>>>          in c).  This folds extension into the BIT_AND_EXPR.
>>>>>>>>>>
>>>>>>>>>> note that what forwprop does (T)x & c -> (T)(x & c') I'd call type 
>>>>>>>>>> hoisting,
>>>>>>>>>> not sinking.  Generally frontends and fold try to narrow operands 
>>>>>>>>>> when
>>>>>>>>>> possible (even though some targets later widen them again because of
>>>>>>>>>> instruction set constraints).
>>>>>>>>>>
>>>>>>>>>> Most of this hoisting code was done to make lowering logical && and 
>>>>>>>>>> ||
>>>>>>>>>> I believe.  Looking at the testcases added tells us that while 
>>>>>>>>>> matching
>>>>>>>>>> the two patterns as done now helps them but only because that pattern
>>>>>>>>>> feeds single-operand instructions that then simplify.  So doing the 
>>>>>>>>>> transform
>>>>>>>>>> starting from that single-operand instructions instead looks like a 
>>>>>>>>>> better
>>>>>>>>>> fix (op !=/== 0/1 and (T) op) and also would not disagree with the
>>>>>>>>>> canonicalization done by fold.
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>> I also can propose to introduce new hook for it but need to know 
>>>>>>>>>>> your
>>>>>>>>>>> opinion since we don't went to waste our time on preparing dead
>>>>>>>>>>> patches. Note that x86 supports all short types in HW and such type
>>>>>>>>>>> sinkning is usually useless if short types are involved.
>>>>>>>>>>>
>>>>>>>>>>> Best regards.
>>>>>>>>>>> Yuri.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>:
>>>>>>>>>>>> On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev 
>>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>
>>>>>>>>>>>>> First of all, your proposal to move type sinking to the end of
>>>>>>>>>>>>> function does not work since we handle each statement in function 
>>>>>>>>>>>>> and
>>>>>>>>>>>>> we want that 1st type folding of X & C will not happen.
>>>>>>>>>>>>> Note that we have the following sequence of gimple before 
>>>>>>>>>>>>> forwprop1:
>>>>>>>>>>>>>
>>>>>>>>>>>>>    x.0_10 = (signed char) x_8;
>>>>>>>>>>>>>   _11 = x.0_10 & 1;
>>>>>>>>>>>>>   _12 = (signed char) y_9;
>>>>>>>>>>>>>   _13 = _12 & 1;
>>>>>>>>>>>>>   _14 = _11 ^ _13;
>>>>>>>>>>>>
>>>>>>>>>>>> Ah, indeed.  Reminds me of some of my dead patches that separated
>>>>>>>>>>>> forwprop into a forward and backward stage.  Of course then you 
>>>>>>>>>>>> have
>>>>>>>>>>>> the ordering issue of whether to first forward or backward.
>>>>>>>>>>>>
>>>>>>>>>>>> Which means that I bet you can construct a testcase that with
>>>>>>>>>>>> your change is no longer optimized (just make pushing the 
>>>>>>>>>>>> conversion
>>>>>>>>>>>> make the types _match_).  Which is always the case
>>>>>>>>>>>> with this kind of local pattern-matching transforms.
>>>>>>>>>>>>
>>>>>>>>>>>> Currently forwprop processes leafs of expression trees first 
>>>>>>>>>>>> (well, inside
>>>>>>>>>>>> a basic-block), similar to how fold () is supposed to be operated, 
>>>>>>>>>>>> based
>>>>>>>>>>>> on the idea that simplified / canonicalized leafs helps keeping 
>>>>>>>>>>>> pattern
>>>>>>>>>>>> recognition simple and cost considerations more accurate.
>>>>>>>>>>>>
>>>>>>>>>>>> When one order works better than another you always have to 
>>>>>>>>>>>> consider
>>>>>>>>>>>> that the user could already have written the code in a way that 
>>>>>>>>>>>> results
>>>>>>>>>>>> in the input that isn't well handled.
>>>>>>>>>>>>
>>>>>>>>>>>> Not that this helps very much for the situation ;)
>>>>>>>>>>>>
>>>>>>>>>>>> But I don't like the use of first_pass_instance ... and the fix 
>>>>>>>>>>>> isn't
>>>>>>>>>>>> an improvement but just a hack for the benchmark.
>>>>>>>>>>>>
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>> I also added comment to my fix and create new test for it. I also
>>>>>>>>>>>>> checked that this test is passed with patched compiler  only. So
>>>>>>>>>>>>> Change Log was also modified:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ChangeLog
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2013-02-20  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>>         PR tree-optimization/56175
>>>>>>>>>>>>>         * tree-ssa-forwprop.c (simplify_bitwise_binary): Avoid 
>>>>>>>>>>>>> type sinking
>>>>>>>>>>>>>         at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A 
>>>>>>>>>>>>> ^ B) & C
>>>>>>>>>>>>>         for short integer types.
>>>>>>>>>>>>>         * gcc.dg/pr56175.c: New test.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2013/2/20 Richard Biener <richard.guent...@gmail.com>:
>>>>>>>>>>>>>> On Wed, Feb 20, 2013 at 1:00 PM, Yuri Rumyantsev 
>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This patch is aimed to recognize (A & C) ^ (B & C) -> (A ^ B) & 
>>>>>>>>>>>>>>> C
>>>>>>>>>>>>>>> pattern in simpify_bitwise_binary for short integer types.
>>>>>>>>>>>>>>> The fix is very simple - we simply turn off short type sinking 
>>>>>>>>>>>>>>> at the
>>>>>>>>>>>>>>> first pass of forward propagation allows to get
>>>>>>>>>>>>>>> +10% speedup for important benchmark Coremark 1.0 at x86 Atom 
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> +5-7% for other x86 platforms too.
>>>>>>>>>>>>>>> Bootstrapping and regression testing were successful on x86-64.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is it Ok for trunk?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It definitely needs a comment before the checks.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also I think it simply shows that the code is placed at the 
>>>>>>>>>>>>>> wrong spot.
>>>>>>>>>>>>>> Simply moving it down in simplify_bitwise_binary to be done the 
>>>>>>>>>>>>>> very last
>>>>>>>>>>>>>> should get both of the effects done.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can you rework the patch according to that?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You also miss a testcase, we should make sure to not regress 
>>>>>>>>>>>>>> again here.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ChangeLog.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2013-02-20  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         PR tree-optimization/56175
>>>>>>>>>>>>>>>         * tree-ssa-forwprop.c (simplify_bitwise_binary) : Avoid 
>>>>>>>>>>>>>>> type sinking
>>>>>>>>>>>>>>>         at 1st forwprop pass to recognize (A & C) ^ (B & C) -> 
>>>>>>>>>>>>>>> (A ^ B) & C
>>>>>>>>>>>>>>>         for short integer types.

Attachment: fix-pr56175
Description: Binary data

Reply via email to