On 2 June 2011 12:59, Richard Guenther <richard.guent...@gmail.com> wrote: > On Thu, Jun 2, 2011 at 10:46 AM, Ira Rosen <ira.ro...@linaro.org> wrote: >> On 1 June 2011 15:14, Richard Guenther <richard.guent...@gmail.com> wrote: >>> On Wed, Jun 1, 2011 at 1:37 PM, Ira Rosen <ira.ro...@linaro.org> wrote: >>>> On 1 June 2011 12:42, Richard Guenther <richard.guent...@gmail.com> wrote: >>>> >>>>> Did you think about moving pass_optimize_widening_mul before >>>>> loop optimizations? Does that pass catch the cases you are >>>>> teaching the pattern recognizer? I think we should try to expose >>>>> these more complicated instructions to loop optimizers. >>>>> >>>> >>>> pass_optimize_widening_mul doesn't catch these cases, but I can try to >>>> teach it instead of the vectorizer. >>>> I am now testing >>>> >>>> Index: passes.c >>>> =================================================================== >>>> --- passes.c (revision 174391) >>>> +++ passes.c (working copy) >>>> @@ -870,6 +870,7 @@ >>>> NEXT_PASS (pass_split_crit_edges); >>>> NEXT_PASS (pass_pre); >>>> NEXT_PASS (pass_sink_code); >>>> + NEXT_PASS (pass_optimize_widening_mul); >>>> NEXT_PASS (pass_tree_loop); >>>> { >>>> struct opt_pass **p = &pass_tree_loop.pass.sub; >>>> @@ -934,7 +935,6 @@ >>>> NEXT_PASS (pass_forwprop); >>>> NEXT_PASS (pass_phiopt); >>>> NEXT_PASS (pass_fold_builtins); >>>> - NEXT_PASS (pass_optimize_widening_mul); >>>> NEXT_PASS (pass_tail_calls); >>>> NEXT_PASS (pass_rename_ssa_copies); >>>> NEXT_PASS (pass_uncprop); >>>> >>>> to see how it affects other loop optimizations (vectorizer pattern >>>> tests obviously fail). >> >> Looks like it needs copy_prop and dce as well: >> >> Index: passes.c >> =================================================================== >> --- passes.c (revision 174391) >> +++ passes.c (working copy) >> @@ -870,6 +870,9 @@ >> NEXT_PASS (pass_split_crit_edges); >> NEXT_PASS (pass_pre); >> NEXT_PASS (pass_sink_code); >> + NEXT_PASS (pass_copy_prop); >> + NEXT_PASS (pass_dce); >> + NEXT_PASS (pass_optimize_widening_mul); >> NEXT_PASS (pass_tree_loop); >> { >> struct opt_pass **p = &pass_tree_loop.pass.sub; >> @@ -934,7 +937,6 @@ >> NEXT_PASS (pass_forwprop); >> NEXT_PASS (pass_phiopt); >> NEXT_PASS (pass_fold_builtins); >> - NEXT_PASS (pass_optimize_widening_mul); >> NEXT_PASS (pass_tail_calls); >> NEXT_PASS (pass_rename_ssa_copies); >> NEXT_PASS (pass_uncprop); >> >> otherwise I get (on x86_64-suse-linux) >> >> FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmaddss >> FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmaddsd >> FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmsubss >> FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmsubsd >> FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfnmaddss >> FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfnmaddsd > > Hmm. I would have put the pass next to the sincos pass, but yes, > in principle a copyprop & dce pass after PRE makes sense > (the loop passes likely don't run because there are no loops in > those testcases - both copyprop and dce should be scheduled > more like TODOs, or even automatically by the pass manager > via PROPs ...). Dead code can indeed confuse those matching > passes that look for single-use vars. > > I'll think about a more elegant solution for this problem. > > Would you mind checking if the next-to-sincos position makes > any difference?
Before sincos we have D.2747_2 = __builtin_powf (a_1(D), 2.0e+0); D.2746_4 = D.2747_2 + c_3(D); which is transformed by sincos to powmult.8_7 = a_1(D) * a_1(D); D.2747_2 = powmult.8_7; D.2746_4 = D.2747_2 + c_3(D); but widening_mul is confused by D.2747_2 = powmult.8_7; and it needs both copy_prop and dce to remove it: powmult.8_7 = a_1(D) * a_1(D); D.2746_4 = c_3(D) + powmult.8_7; So moving widening_mul next to sincos doesn't help. Maybe gimple_expand_builtin_pow() can be changed to generate the last version by itself? Ira > > Thanks, > Richard. > >> Ira >> >>> >>> Thanks. I would hope that we eventually can get rid of the >>> pattern recognizer ... at least for SSE there is also always >>> a scalar variant instruction for each vectorized one. >>> >>> Richard. >>> >> >