http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326



--- Comment #42 from rguenther at suse dot de <rguenther at suse dot de> 
2013-03-08 09:22:39 UTC ---

On Thu, 7 Mar 2013, steven at gcc dot gnu.org wrote:



> 

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39326

> 

> --- Comment #38 from Steven Bosscher <steven at gcc dot gnu.org> 2013-03-07 
> 22:15:39 UTC ---

> Created attachment 29612

>   --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29612

> Punt on loops with more memory references than LIM can handle

> 

> For the LIM problem, I'm thinking of a patch like this one (not tested

> yet).  Basically, this means giving up on really large loops with many

> loads and/or stores.  That's not an unreasonable thing to do anyway.

> Code motion from really big loops usually isn't an improvement.

> 

> Richi, could you have a look at this, and see if I've got it all right,

> more-or-less?  LIM is quite complicated and I'm not sure if I should

> look at, or update, the set of optimizable loops somewhere.



Yeah, well - it should be easy to avoid most overhead (even collecting

the memory references) with my proposed scheme (see patch).  First,

for each BB count the number of memory stmts (easy, look for VUSEs),

then, when walking the set of outermost loops we want to apply LIM to

sum over its BB counts and instead walk over its siblings if the number

is too large.



But yes, limiting it should be done as it performs O(n^2) dependence

checks (well, O(#of stores * #of memory references), so for a low

number of stores it's quite cheap).



I've yet to recover the obstack-ification of struct depend,

struct mem_ref, struct mem_ref_locs and lim_aux_data allocations ...

(LIM is the biggest load on malloc/free).



> With the patch, and with "-O2 -fgcse", I now have:

> 

> gap_TlnLv4.c: In function 'RDFT_49152_1':

> gap_TlnLv4.c:37502:18: warning: -ftree-loop-im: number of memory references in

> loop exceeds the --param loops-max-datarefs-for-datadeps threshold

> [-Wdisabled-optimization]

>       t12308[500] = t12304[6144*i0+1000];

>                   ^

> 

>  dead store elim1        :  57.70 ( 8%)

>  dead store elim2        :  76.82 (10%)

>  combiner                : 360.07 (48%)

>  reload CSE regs         :  55.44 ( 7%)

>  TOTAL                   : 743.77



Nice.  Well ... ;)

Reply via email to