David Malcolm <dmalc...@redhat.com> writes: > On Wed, 2018-10-24 at 14:05 +0100, Richard Sandiford wrote: >> This patch adds a pass that versions loops with variable index >> strides >> for the case in which the stride is 1. E.g.: >> >> for (int i = 0; i < n; ++i) >> x[i * stride] = ...; >> >> becomes: >> >> if (stepx == 1) >> for (int i = 0; i < n; ++i) >> x[i] = ...; >> else >> for (int i = 0; i < n; ++i) >> x[i * stride] = ...; >> >> This is useful for both vector code and scalar code, and in some >> cases >> can enable further optimisations like loop interchange or pattern >> recognition. >> >> The pass gives a 7.6% improvement on Cortex-A72 for 554.roms_r at -O3 >> and a 2.4% improvement for 465.tonto. I haven't found any SPEC tests >> that regress. >> >> Sizewise, there's a 10% increase in .text for both 554.roms_r and >> 465.tonto. That's obviously a lot, but in tonto's case it's because >> the whole program is written using assumed-shape arrays and pointers, >> so a large number of functions really do benefit from versioning. >> roms likewise makes heavy use of assumed-shape arrays, and that >> improvement in performance IMO justifies the code growth. >> >> The next biggest .text increase is 4.5% for 548.exchange2_r. I did >> see >> a small (0.4%) speed improvement there, but although both 3-iteration >> runs >> produced stable results, that might still be noise. There was a >> slightly >> larger (non-noise) improvement for a 256-bit SVE model. >> >> 481.wrf and 521.wrf_r .text grew by 2.8% and 2.5% respectively, but >> without any noticeable improvement in performance. No other test >> grew >> by more than 2%. >> >> Although the main SPEC beneficiaries are all Fortran tests, the >> benchmarks we use for SVE also include some C and C++ tests that >> benefit. >> >> Using -frepack-arrays gives the same benefits in many Fortran cases. >> The problem is that using that option inappropriately can force a >> full >> array copy for arguments that the function only reads once, and so it >> isn't really something we can turn on by default. The new pass is >> supposed to give most of the benefits of -frepack-arrays without >> the risk of unnecessary repacking. >> >> The patch therefore enables the pass by default at -O3. >> >> Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? >> >> Richard >> > > [...snip...] > >> +/* Run the pass and return a set of TODO_* flags. */ >> + >> +unsigned int >> +loop_versioning::run () >> +{ >> + gcc_assert (scev_initialized_p ()); >> + >> + if (!analyze_blocks () >> + || !prune_conditions () >> + || !make_versioning_decisions () >> + || !implement_versioning_decisions ()) >> + return 0; >> + >> + return TODO_update_ssa; >> +} >> + >> +/* Loop versioningting pass. */ > > (typo)
Huh, no idea how I even got there. >> + >> +namespace { > > Could the whole file be within this anonymous namespace, rather than > just the opt_pass subclass? (hiding class loop_versioning, so that the > optimizer knows that the only thing visible outside the TU is > make_pass_loop_versioning). This can be a pain to debug, but you can > always comment out the anon namespace locally when debugging. Yeah, I prefer that style too, so that the TU only exports public interfaces. I'd got the impression from earlier threads that this was frowned on for GCC and so I was deliberately avoiding it, but if it's OK then great. Thanks, Richard