On Wed, Jul 13, 2011 at 4:34 PM, Richard Guenther <richard.guent...@gmail.com> wrote: > On Wed, Jul 13, 2011 at 3:13 PM, Andreas Krebbel > <kreb...@linux.vnet.ibm.com> wrote: >> Hi, >> >> the widening_mul pass might increase the number of multiplications in >> the code by transforming >> >> a = b * c >> d = a + 2 >> e = a + 3 >> >> into: >> >> d = b * c + 2 >> e = b * c + 3 >> >> under the assumption that an FMA instruction is not more expensive >> than a simple add. This certainly isn't always true. While e.g. on >> s390 an fma is indeed not slower than an add execution-wise it has >> disadvantages regarding instruction grouping. It doesn't group with >> any other instruction what has a major impact on the instruction >> dispatch bandwidth. >> >> The following patch tries to figure out the costs for adds, mults and >> fmas by building an RTX and asking the backends cost function in order >> to estimate whether it is whorthwhile doing the transformation. >> >> With that patch the 436.cactus hotloop contains 28 less >> multiplications than before increasing performance slightly (~2%). >> >> Bootstrapped and regtested on x86_64 and s390x. > > Ick ;)
+1 > Maybe this is finally the time to introduce target hook(s) to > get us back costs for trees? For this case we'd need two > actually, or just one - dependent on what finegrained information > we pass. Choices: > > tree_code_cost (enum tree_code) > tree_code_cost (enum tree_code, enum machine_mode mode) > unary_cost (enum tree_code, tree actual_arg0) // args will be mostly > SSA names or constants, but at least they are typed - works for > mixed-typed operations > binary_cost (...) > ... > unary_cost (enum tree_code, enum tree_code arg0_kind) // constant > vs. non-constant arg, but lacks type/mode Or maybe add a cost function for all named insns (i.e. http://gcc.gnu.org/onlinedocs/gccint/Standard-Names.html#Standard-Names)? I think that any form of lower GIMPLE will not be so low level that more combinations will exist than the available named patterns. It should be possible to write a gen* tool using rtx_costs to compute some useful cost metric for all named patterns. How complicated that could be (modes, reg vs. mem, etc.), I don't know... But at least that way we don't end up with multiple target costs depending on the IR in use. Ciao! Steven