Richard, This regression was introduced by Kai
http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html 2011-06-27 Kai Tietz <kti...@redhat.com> * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve type sinking. * tree-ssa-math-opts.c (execute_optimize_bswap): Separate search for di/si mode patterns for finding widest match. Is it sufficient for you to accept my patch? Best regards. yuri. 2013/2/21 Richard Biener <richard.guent...@gmail.com>: > On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >> Richard, >> >> As we know Kai is working on this problem for 4.9 and I assume that >> type sinking will be deleted from forwprop pass. Could we stay on this >> fix but more common fix will be done. > > Well, unless you show it is a regression the patch is not applicable for 4.8 > anyway. Not sure if the code will be deleted from forwprop pass in 4.9 > either, > it is after all a canonicalization - fold seems to perform the opposite one > though: > > /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer > constants (if x has signed type, the sign bit cannot be set > in c). This folds extension into the BIT_AND_EXPR. > > note that what forwprop does (T)x & c -> (T)(x & c') I'd call type hoisting, > not sinking. Generally frontends and fold try to narrow operands when > possible (even though some targets later widen them again because of > instruction set constraints). > > Most of this hoisting code was done to make lowering logical && and || > I believe. Looking at the testcases added tells us that while matching > the two patterns as done now helps them but only because that pattern > feeds single-operand instructions that then simplify. So doing the transform > starting from that single-operand instructions instead looks like a better > fix (op !=/== 0/1 and (T) op) and also would not disagree with the > canonicalization done by fold. > > Richard. > >> I also can propose to introduce new hook for it but need to know your >> opinion since we don't went to waste our time on preparing dead >> patches. Note that x86 supports all short types in HW and such type >> sinkning is usually useless if short types are involved. >> >> Best regards. >> Yuri. >> >> >> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>> On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>>> Richard, >>>> >>>> First of all, your proposal to move type sinking to the end of >>>> function does not work since we handle each statement in function and >>>> we want that 1st type folding of X & C will not happen. >>>> Note that we have the following sequence of gimple before forwprop1: >>>> >>>> x.0_10 = (signed char) x_8; >>>> _11 = x.0_10 & 1; >>>> _12 = (signed char) y_9; >>>> _13 = _12 & 1; >>>> _14 = _11 ^ _13; >>> >>> Ah, indeed. Reminds me of some of my dead patches that separated >>> forwprop into a forward and backward stage. Of course then you have >>> the ordering issue of whether to first forward or backward. >>> >>> Which means that I bet you can construct a testcase that with >>> your change is no longer optimized (just make pushing the conversion >>> make the types _match_). Which is always the case >>> with this kind of local pattern-matching transforms. >>> >>> Currently forwprop processes leafs of expression trees first (well, inside >>> a basic-block), similar to how fold () is supposed to be operated, based >>> on the idea that simplified / canonicalized leafs helps keeping pattern >>> recognition simple and cost considerations more accurate. >>> >>> When one order works better than another you always have to consider >>> that the user could already have written the code in a way that results >>> in the input that isn't well handled. >>> >>> Not that this helps very much for the situation ;) >>> >>> But I don't like the use of first_pass_instance ... and the fix isn't >>> an improvement but just a hack for the benchmark. >>> >>> Richard. >>> >>>> I also added comment to my fix and create new test for it. I also >>>> checked that this test is passed with patched compiler only. So >>>> Change Log was also modified: >>>> >>>> ChangeLog >>>> >>>> 2013-02-20 Yuri Rumyantsev <ysrum...@gmail.com> >>>> >>>> PR tree-optimization/56175 >>>> * tree-ssa-forwprop.c (simplify_bitwise_binary): Avoid type sinking >>>> at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) & C >>>> for short integer types. >>>> * gcc.dg/pr56175.c: New test. >>>> >>>> >>>> >>>> >>>> 2013/2/20 Richard Biener <richard.guent...@gmail.com>: >>>>> On Wed, Feb 20, 2013 at 1:00 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>> wrote: >>>>>> Hi All, >>>>>> >>>>>> This patch is aimed to recognize (A & C) ^ (B & C) -> (A ^ B) & C >>>>>> pattern in simpify_bitwise_binary for short integer types. >>>>>> The fix is very simple - we simply turn off short type sinking at the >>>>>> first pass of forward propagation allows to get >>>>>> +10% speedup for important benchmark Coremark 1.0 at x86 Atom and >>>>>> +5-7% for other x86 platforms too. >>>>>> Bootstrapping and regression testing were successful on x86-64. >>>>>> >>>>>> Is it Ok for trunk? >>>>> >>>>> It definitely needs a comment before the checks. >>>>> >>>>> Also I think it simply shows that the code is placed at the wrong spot. >>>>> Simply moving it down in simplify_bitwise_binary to be done the very last >>>>> should get both of the effects done. >>>>> >>>>> Can you rework the patch according to that? >>>>> >>>>> You also miss a testcase, we should make sure to not regress again here. >>>>> >>>>> Thanks, >>>>> Richard. >>>>> >>>>>> ChangeLog. >>>>>> >>>>>> 2013-02-20 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>> >>>>>> PR tree-optimization/56175 >>>>>> * tree-ssa-forwprop.c (simplify_bitwise_binary) : Avoid type >>>>>> sinking >>>>>> at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) & >>>>>> C >>>>>> for short integer types.