Please review the changes.html change and suggest better wordings if possible:
ndex: htdocs/gcc-4.9/changes.html =================================================================== RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v retrieving revision 1.26 diff -u -r1.26 changes.html --- htdocs/gcc-4.9/changes.html 26 Aug 2013 14:16:31 -0000 1.26 +++ htdocs/gcc-4.9/changes.html 26 Sep 2013 18:02:33 -0000 @@ -37,6 +37,7 @@ <ul> <li>AddressSanitizer, a fast memory error detector, is now available on ARM. </li> + <li>GCC introduces a new cost model for vectorizer, called 'cheap' model. The new cost model is intenteded to minimize compile time, code size, and potential negative runtime impact introduced when vectorizer is turned on at the expense of not getting the maximum potential runtime speedup. The 'cheap' model will be the default when vectorizer is turned on at <code>-O2</code>. To override this, use option <code>-fvect-cost-model=[cheap|dynamic|unlimited]</code>. </ul> <h2>New Languages and Language specific improvements</h2> thanks, David On Thu, Sep 26, 2013 at 11:09 AM, Xinliang David Li <davi...@google.com> wrote: > On Thu, Sep 26, 2013 at 7:37 AM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Thu, Sep 26, 2013 at 1:10 AM, Xinliang David Li <davi...@google.com> >> wrote: >>> I took the liberty to pick up Richard's original fvect-cost-model >>> patch and made some modification. >>> >>> What has not changed: >>> 1) option -ftree-vect-loop-version is removed; >>> 2) three cost models are introduced: cheap, dynamic, and unlimited; >>> 3) unless explicitly specified, cheap model is the default at O2 (e.g. >>> when -ftree-loop-vectorize is used with -O2), and dynamic mode is the >>> default for O3 and FDO >>> 4) alignment based versioning is disabled with cheap model. >>> >>> What has changed: >>> 1) peeling is also disabled with cheap model; >>> 2) alias check condition limit is reduced with cheap model, but not >>> completely suppressed. Runtime alias check is a pretty important >>> enabler. >>> 3) tree if conversion changes are not included. >>> >>> Does this patch look reasonable? >> >> In principle yes. Note that it changes the behavior of -O2 -ftree-vectorize >> as -ftree-vectorize does not imply changing the default cost model. I am >> fine with that, but eventually this will have some testsuite fallout. This >> reorg would also need documenting in changes.html to make people >> aware of this. > > > Here is the proposed change: > > > Index: htdocs/gcc-4.9/changes.html > =================================================================== > RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v > retrieving revision 1.26 > diff -u -r1.26 changes.html > --- htdocs/gcc-4.9/changes.html 26 Aug 2013 14:16:31 -0000 1.26 > +++ htdocs/gcc-4.9/changes.html 26 Sep 2013 18:02:33 -0000 > @@ -37,6 +37,7 @@ > <ul> > <li>AddressSanitizer, a fast memory error detector, is now > available on ARM. > </li> > + <li>GCC introduces a new cost model for vectorizer, called > 'cheap' model. The new cost model is intenteded to minimize compile > time, code size, and potential negative runtime impact introduced when > vectorizer is turned on at the expense of not getting the maximum > potential runtime speedup. The 'cheap' model will be the default when > vectorizer is turned on at <code>-O2</code>. To override this, use > option <code>-fvect-cost-model=[cheap|dynamic|unlimited]</code>. > </ul> > > <h2>New Languages and Language specific improvements</h2> > > >> >> With completely disabling alingment peeling and alignment versioning >> you cut out targets that have no way of performing unaligned accesses. >> From looking at vect_no_align this are mips, sparc, ia64 and some arm. >> A compromise for them would be to allow peeling a single iteration >> and some alignment checks (like up to two?). >> > > Possibly. I think target owners can choose to do target specific > tunings as follow up. > > >> Reducing the number of allowed alias-checks is ok, but I'd reduce it >> more than to 6 (was that an arbitrary number or is that the result of >> some benchmarking?) >> > > yes -- we found that it is not uncommon to have a loop with 2 or 3 > distinct source address and 1 or 2 target address. > > There are also tuning opportunities. For instance, in cases where > source address are derived from the same base, a consolidated alias > check (against the whole access range instead of just checking cross > 1-unrolled iteration dependence) can be done. > >> I suppose all of the params could use some benchmarking to select >> a sweet spot in code size vs. runtime. > > Agree. > > >> >> I suppose the patch is ok as-is (if it actually works) if you provide >> a changelog and propose an entry for changes.html. We can >> tune the params for the cheap model as followup. > > Ok. I will do more testing and check in the patch with proper > ChangeLog. The changes.html change will be done separately. > > thanks, > > David > > >> >> Thanks for picking this up, >> Richard. >> >>> thanks, >>> >>> David