On Thu, Jun 2, 2011 at 1:08 PM, Ira Rosen <ira.ro...@linaro.org> wrote: > On 2 June 2011 12:59, Richard Guenther <richard.guent...@gmail.com> wrote: >> On Thu, Jun 2, 2011 at 10:46 AM, Ira Rosen <ira.ro...@linaro.org> wrote: >>> On 1 June 2011 15:14, Richard Guenther <richard.guent...@gmail.com> wrote: >>>> On Wed, Jun 1, 2011 at 1:37 PM, Ira Rosen <ira.ro...@linaro.org> wrote: >>>>> On 1 June 2011 12:42, Richard Guenther <richard.guent...@gmail.com> wrote: >>>>> >>>>>> Did you think about moving pass_optimize_widening_mul before >>>>>> loop optimizations? Does that pass catch the cases you are >>>>>> teaching the pattern recognizer? I think we should try to expose >>>>>> these more complicated instructions to loop optimizers. >>>>>> >>>>> >>>>> pass_optimize_widening_mul doesn't catch these cases, but I can try to >>>>> teach it instead of the vectorizer. >>>>> I am now testing >>>>> >>>>> Index: passes.c >>>>> =================================================================== >>>>> --- passes.c (revision 174391) >>>>> +++ passes.c (working copy) >>>>> @@ -870,6 +870,7 @@ >>>>> NEXT_PASS (pass_split_crit_edges); >>>>> NEXT_PASS (pass_pre); >>>>> NEXT_PASS (pass_sink_code); >>>>> + NEXT_PASS (pass_optimize_widening_mul); >>>>> NEXT_PASS (pass_tree_loop); >>>>> { >>>>> struct opt_pass **p = &pass_tree_loop.pass.sub; >>>>> @@ -934,7 +935,6 @@ >>>>> NEXT_PASS (pass_forwprop); >>>>> NEXT_PASS (pass_phiopt); >>>>> NEXT_PASS (pass_fold_builtins); >>>>> - NEXT_PASS (pass_optimize_widening_mul); >>>>> NEXT_PASS (pass_tail_calls); >>>>> NEXT_PASS (pass_rename_ssa_copies); >>>>> NEXT_PASS (pass_uncprop); >>>>> >>>>> to see how it affects other loop optimizations (vectorizer pattern >>>>> tests obviously fail). >>> >>> Looks like it needs copy_prop and dce as well: >>> >>> Index: passes.c >>> =================================================================== >>> --- passes.c (revision 174391) >>> +++ passes.c (working copy) >>> @@ -870,6 +870,9 @@ >>> NEXT_PASS (pass_split_crit_edges); >>> NEXT_PASS (pass_pre); >>> NEXT_PASS (pass_sink_code); >>> + NEXT_PASS (pass_copy_prop); >>> + NEXT_PASS (pass_dce); >>> + NEXT_PASS (pass_optimize_widening_mul); >>> NEXT_PASS (pass_tree_loop); >>> { >>> struct opt_pass **p = &pass_tree_loop.pass.sub; >>> @@ -934,7 +937,6 @@ >>> NEXT_PASS (pass_forwprop); >>> NEXT_PASS (pass_phiopt); >>> NEXT_PASS (pass_fold_builtins); >>> - NEXT_PASS (pass_optimize_widening_mul); >>> NEXT_PASS (pass_tail_calls); >>> NEXT_PASS (pass_rename_ssa_copies); >>> NEXT_PASS (pass_uncprop); >>> >>> otherwise I get (on x86_64-suse-linux) >>> >>> FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmaddss >>> FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmaddsd >>> FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmsubss >>> FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfmsubsd >>> FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfnmaddss >>> FAIL: gcc.target/i386/fma4-fma-2.c scan-assembler vfnmaddsd >> >> Hmm. I would have put the pass next to the sincos pass, but yes, >> in principle a copyprop & dce pass after PRE makes sense >> (the loop passes likely don't run because there are no loops in >> those testcases - both copyprop and dce should be scheduled >> more like TODOs, or even automatically by the pass manager >> via PROPs ...). Dead code can indeed confuse those matching >> passes that look for single-use vars. >> >> I'll think about a more elegant solution for this problem. >> >> Would you mind checking if the next-to-sincos position makes >> any difference? > > Before sincos we have > > D.2747_2 = __builtin_powf (a_1(D), 2.0e+0); > D.2746_4 = D.2747_2 + c_3(D); > > which is transformed by sincos to > > powmult.8_7 = a_1(D) * a_1(D); > D.2747_2 = powmult.8_7; > D.2746_4 = D.2747_2 + c_3(D); > > but widening_mul is confused by D.2747_2 = powmult.8_7; and it needs > both copy_prop and dce to remove it: > > powmult.8_7 = a_1(D) * a_1(D); > D.2746_4 = c_3(D) + powmult.8_7; > > So moving widening_mul next to sincos doesn't help. > Maybe gimple_expand_builtin_pow() can be changed to generate the last > version by itself?
Yeah, I guess so. I'll have a look. Richard. > Ira > >> >> Thanks, >> Richard. >> >>> Ira >>> >>>> >>>> Thanks. I would hope that we eventually can get rid of the >>>> pattern recognizer ... at least for SSE there is also always >>>> a scalar variant instruction for each vectorized one. >>>> >>>> Richard. >>>> >>> >> >