Richard,

This regression was introduced by Kai

http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html

2011-06-27  Kai Tietz  <kti...@redhat.com>

        * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve
        type sinking.
        * tree-ssa-math-opts.c (execute_optimize_bswap): Separate
        search for di/si mode patterns for finding widest match.

Is it sufficient for you to accept my patch?

Best regards.
yuri.

2013/2/21 Richard Biener <richard.guent...@gmail.com>:
> On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>> Richard,
>>
>> As we know Kai is working on this problem for 4.9 and I assume that
>> type sinking will be deleted from forwprop pass. Could we stay on this
>> fix but more common fix will be done.
>
> Well, unless you show it is a regression the patch is not applicable for 4.8
> anyway.  Not sure if the code will be deleted from forwprop pass in 4.9 
> either,
> it is after all a canonicalization - fold seems to perform the opposite one
> though:
>
>       /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer
>          constants (if x has signed type, the sign bit cannot be set
>          in c).  This folds extension into the BIT_AND_EXPR.
>
> note that what forwprop does (T)x & c -> (T)(x & c') I'd call type hoisting,
> not sinking.  Generally frontends and fold try to narrow operands when
> possible (even though some targets later widen them again because of
> instruction set constraints).
>
> Most of this hoisting code was done to make lowering logical && and ||
> I believe.  Looking at the testcases added tells us that while matching
> the two patterns as done now helps them but only because that pattern
> feeds single-operand instructions that then simplify.  So doing the transform
> starting from that single-operand instructions instead looks like a better
> fix (op !=/== 0/1 and (T) op) and also would not disagree with the
> canonicalization done by fold.
>
> Richard.
>
>> I also can propose to introduce new hook for it but need to know your
>> opinion since we don't went to waste our time on preparing dead
>> patches. Note that x86 supports all short types in HW and such type
>> sinkning is usually useless if short types are involved.
>>
>> Best regards.
>> Yuri.
>>
>>
>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>:
>>> On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> First of all, your proposal to move type sinking to the end of
>>>> function does not work since we handle each statement in function and
>>>> we want that 1st type folding of X & C will not happen.
>>>> Note that we have the following sequence of gimple before forwprop1:
>>>>
>>>>    x.0_10 = (signed char) x_8;
>>>>   _11 = x.0_10 & 1;
>>>>   _12 = (signed char) y_9;
>>>>   _13 = _12 & 1;
>>>>   _14 = _11 ^ _13;
>>>
>>> Ah, indeed.  Reminds me of some of my dead patches that separated
>>> forwprop into a forward and backward stage.  Of course then you have
>>> the ordering issue of whether to first forward or backward.
>>>
>>> Which means that I bet you can construct a testcase that with
>>> your change is no longer optimized (just make pushing the conversion
>>> make the types _match_).  Which is always the case
>>> with this kind of local pattern-matching transforms.
>>>
>>> Currently forwprop processes leafs of expression trees first (well, inside
>>> a basic-block), similar to how fold () is supposed to be operated, based
>>> on the idea that simplified / canonicalized leafs helps keeping pattern
>>> recognition simple and cost considerations more accurate.
>>>
>>> When one order works better than another you always have to consider
>>> that the user could already have written the code in a way that results
>>> in the input that isn't well handled.
>>>
>>> Not that this helps very much for the situation ;)
>>>
>>> But I don't like the use of first_pass_instance ... and the fix isn't
>>> an improvement but just a hack for the benchmark.
>>>
>>> Richard.
>>>
>>>> I also added comment to my fix and create new test for it. I also
>>>> checked that this test is passed with patched compiler  only. So
>>>> Change Log was also modified:
>>>>
>>>> ChangeLog
>>>>
>>>> 2013-02-20  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>
>>>>         PR tree-optimization/56175
>>>>         * tree-ssa-forwprop.c (simplify_bitwise_binary): Avoid type sinking
>>>>         at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) & C
>>>>         for short integer types.
>>>>         * gcc.dg/pr56175.c: New test.
>>>>
>>>>
>>>>
>>>>
>>>> 2013/2/20 Richard Biener <richard.guent...@gmail.com>:
>>>>> On Wed, Feb 20, 2013 at 1:00 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>> wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> This patch is aimed to recognize (A & C) ^ (B & C) -> (A ^ B) & C
>>>>>> pattern in simpify_bitwise_binary for short integer types.
>>>>>> The fix is very simple - we simply turn off short type sinking at the
>>>>>> first pass of forward propagation allows to get
>>>>>> +10% speedup for important benchmark Coremark 1.0 at x86 Atom and
>>>>>> +5-7% for other x86 platforms too.
>>>>>> Bootstrapping and regression testing were successful on x86-64.
>>>>>>
>>>>>> Is it Ok for trunk?
>>>>>
>>>>> It definitely needs a comment before the checks.
>>>>>
>>>>> Also I think it simply shows that the code is placed at the wrong spot.
>>>>> Simply moving it down in simplify_bitwise_binary to be done the very last
>>>>> should get both of the effects done.
>>>>>
>>>>> Can you rework the patch according to that?
>>>>>
>>>>> You also miss a testcase, we should make sure to not regress again here.
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>> ChangeLog.
>>>>>>
>>>>>> 2013-02-20  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>
>>>>>>         PR tree-optimization/56175
>>>>>>         * tree-ssa-forwprop.c (simplify_bitwise_binary) : Avoid type 
>>>>>> sinking
>>>>>>         at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) & 
>>>>>> C
>>>>>>         for short integer types.

Reply via email to