On Thu, 28 May 2020, Kewen.Lin wrote: > Hi, > > This is one repost and you can refer to the original series > via https://gcc.gnu.org/pipermail/gcc-patches/2020-January/538360.html. > > As we discussed in the thread > https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00196.html > Original: https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00104.html, > I'm working to teach IVOPTs to consider D-form group access during unrolling. > The difference on D-form and other forms during unrolling is we can put the > stride into displacement field to avoid additional step increment. eg: > > With X-form (uf step increment): > ... > LD A = baseA, X > LD B = baseB, X > ST C = baseC, X > X = X + stride > LD A = baseA, X > LD B = baseB, X > ST C = baseC, X > X = X + stride > LD A = baseA, X > LD B = baseB, X > ST C = baseC, X > X = X + stride > ... > > With D-form (one step increment for each base): > ... > LD A = baseA, OFF > LD B = baseB, OFF > ST C = baseC, OFF > LD A = baseA, OFF+stride > LD B = baseB, OFF+stride > ST C = baseC, OFF+stride > LD A = baseA, OFF+2*stride > LD B = baseB, OFF+2*stride > ST C = baseC, OFF+2*stride > ... > baseA += stride * uf > baseB += stride * uf > baseC += stride * uf > > Imagining that if the loop get unrolled by 8 times, then 3 step updates with > D-form vs. 8 step updates with X-form. Here we only need to check stride > meet D-form field requirement, since if OFF doesn't meet, we can construct > baseA' with baseA + OFF.
I'd just mention there are other targets that have the choice between the above forms. Since IVOPTs itself does not perform the unrolling the IL it produces is the same, correct? Richard. > This patch set consists four parts: > > [PATCH 1/4] unroll: Add middle-end unroll factor estimation > > Add unroll factor estimation in middle-end. It mainly refers to current > RTL unroll factor determination in function decide_unrolling and its > sub calls. As Richi suggested, we probably can force unroll factor > with this and avoid duplicate unroll factor calculation, but I think it > need more benchmarking work and should be handled separately. > > [PATCH 2/4] param: Introduce one param to control unroll factor > > As Richard and Segher's suggestion, I used addr_offset_valid_p for the > addressing mode, rather than one target hook. As Richard's suggestion, > > it introduces one parameter to control this IVOPTs consideration and > further tweaking [3/4] on top of unroll factor estimation [1/4]. > > [PATCH 3/4] ivopts: Consider cost_step on different forms during unrolling > > Teach IVOPTs to mark the IV cand as reg_offset_p which is derived from > one address IV type group where the whole group is valid to use > reg_offset > mode. Then scaling up the IV cand step cost by (uf - 1) for no > reg_offset_p IV cands, here the uf is one estimated unroll factor [1/4]. > > [PATCH 4/4] rs6000: P9 D-form test cases > > Add some test cases, mainly copied from Kelvin's patch. This is approved > by Segher if the whole series is fine. > > > Many thanks to Richard and Segher on previous version reviews. > > Bootstrapped and regress tested on powerpc64le-linux-gnu. > > Any comments are highly appreciated! Thanks in advance! > > > BR, > Kewen > > ------- > > gcc/cfgloop.h | 3 ++ > gcc/config/i386/i386-options.c | 6 +++ > gcc/config/s390/s390.c | 6 +++ > gcc/doc/invoke.texi | 9 +++++ > gcc/params.opt | 4 ++ > gcc/tree-ssa-loop-ivopts.c | 100 > ++++++++++++++++++++++++++++++++++++++++++++++- > gcc/tree-ssa-loop-manip.c | 253 > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > gcc/tree-ssa-loop-manip.h | 3 +- > gcc/tree-ssa-loop.c | 33 ++++++++++++++++ > gcc/tree-ssa-loop.h | 2 + > 10 files changed, 416 insertions(+), 3 deletions(-) > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)