On Wed, 24 Mar 2021, guojiufu wrote: > On 2021-03-24 15:55, Richard Biener wrote: > > On Wed, Mar 24, 2021 at 3:55 AM guojiufu <guoji...@linux.ibm.com> wrote: > >> > >> On 2021-03-23 16:25, Richard Biener via Gcc wrote: > >> > On Tue, Mar 23, 2021 at 4:33 AM guojiufu <guoji...@imap.linux.ibm.com> > >> > wrote: > >> >> > >> >> On 2021-03-22 16:31, Jakub Jelinek via Gcc wrote: > >> >> > On Mon, Mar 22, 2021 at 09:22:26AM +0100, Richard Biener via Gcc > >> >> > wrote: > >> >> >> Better than doing loop versioning is to enhance SCEV (and thus also > >> >> >> dependence analysis) to track extra conditions they need to handle > >> >> >> cases similar as to how niter analysis computes it's 'assumptions' > >> >> >> condition. That allows the versioning to be done when there's an > >> >> >> actual beneficial transform (like vectorization) rather than just > >> >> >> upfront for the eventual chance that there'll be any. Ideally such > >> >> >> transform would then choose IVs in their transformed copy that > >> >> >> are analyzable w/o repeating such versioning exercise for the next > >> >> >> transform. > >> >> > > >> >> > And it might be beneficial to perform some type promotion/demotion > >> >> > pass, either early during vectorization or separately before > >> >> > vectorization > >> >> > on a loop copy guarded with the ifns e.g. ifconv uses too. > >> >> > Find out what type sizes the loop use, first try to demote > >> >> > computations > >> >> > to narrower types in the vectorized loop candidate (e.g. if something > >> >> > is computed in a wider type only to have the result demoted to > >> >> > narrower > >> >> > type), then pick up the widest type size still in use in the loop (ok, > >> >> > this assumes we don't mix multiple vector sizes in the loop, but > >> >> > currently > >> >> > our vectorizer doesn't do that) and try to promote computations that > >> >> > could > >> >> > be promoted to that type size. We do partially something like that > >> >> > during > >> >> > vect patterns for bool types, but not other types I think. > >> >> > > >> >> > Jakub > >> >> > >> >> Thanks for the suggestions! > >> >> > >> >> Enhancing SCEV could help other optimizations and improve performance > >> >> in > >> >> some cases. > >> >> While one of the direct ideas of using the '64bit type' is to > >> >> eliminate > >> >> conversions, > >> >> even for some cases which are not easy to be optimized through > >> >> ifconv/vectorization, > >> >> for examples: > >> >> > >> >> unsigned int i = 0; > >> >> while (a[i]>1e-3) > >> >> i++; > >> >> > >> >> unsigned int i = 0; > >> >> while (p1[i] == p2[i] && p1[i] != '\0') > >> >> i++; > >> >> > >> >> Or only do versioning on type for this kind of loop? Any suggestions? > >> > > >> > But the "optimization" resulting from such versioning is hard to > >> > determine upfront which means we'll pay quite a big code size cost > >> > for unknown questionable gain. What's the particular optimization > >> > >> Right. Code size increasing is a big pain on large loops. If the gain > >> is not significant, this optimization may not positive. > >> > >> > in the above cases? Note that for example for > >> > > >> > unsigned int i = 0; > >> > while (a[i]>1e-3) > >> > i++; > >> > > >> > you know that when 'i' wraps then the loop will not terminate. There's > >> > >> Thanks :) The code would be "while (a[i]>1e-3 && i < n)", the upbound is > >> checkable. Otherwise, the optimization to avoid zext is not adoptable. > >> > >> > the address computation that is i * sizeof (T) which is done in a > >> > larger > >> > type to avoid overflow so we have &a + zext (i) * 8 - is that the > >> > operation > >> > that is 'slow' for you? > >> > >> This is the point: "zext(i)" is the instruction that I want to > >> eliminate, > >> which is the direct goal of the optimization. > >> > >> The gain of eliminating the 'zext' is visible or not, and the code size > >> increasing is small enough or not, this is a question and needs to > >> trade-off. > >> It may be only acceptable if the loop is very small, then eliminating > >> 'zext' > >> would help to save runtime, and code size increase maybe not big. > > > > OK, so I indeed think that the desire to micro-optimize a 'zext' doesn't > > make versioning a good trade-off. The micro-architecture should better > > not make that totally slow (I'd expect an extra latency comparable to > > the multiply or add on the &a + zext(i) * 8 instruction chain). > > Agree, I understand your point. The concern is some micro-architectures are > not > very well on this yet. I tested the above example code: > unsigned i = 0; > while (a[i] > 1e-3 && i < n) > i++; > there are ~30% performance improvement if using "long i" instead of"unsigned > i" > on ppc64le and x86. It seems those instructions are not optimized too much on > some platforms. So, I'm wondering we need to do this in GCC.
On x86 I see indexed addressing modes being used which should be fine. Compilable testcase: unsigned foo (double *a, unsigned n) { unsigned i = 0; while (a[i] > 1e-3 && i < n) i++; return i; } ppc64le seems to do some odd unrolling/peeling or whatnot, I have a hard time following its assembly ... ah, -fno-unroll-loops "helps" and produces .L5: lfd %f0,0(%r9) addi %r3,%r3,1 addi %r9,%r9,8 rldicl %r3,%r3,0,32 fcmpu %cr0,%f0,%f12 bnglr %cr0 bdnz .L5 which looks pretty good to me, I suppose the rldicl is the zero-extension but the IVs are already 64bit and the zero-extension should be sunk to the loop exit instead. > > > > OTOH making SCEV analysis not give up but instead record the constraints > > under which its solution is valid is a very good and useful thing to do. > > Thanks! Enhance SCEV could help a few cases, especially when other > optimizations > are enabled. > > Thanks again for your suggestions! > > BR. > Jiufu Guo. > > > > > Richard. > > > >> Thanks again for your very helpful comments! > >> > >> BR. > >> Jiufu Guo. > >> > >> > > >> > Richard. > >> > > >> >> BR. > >> >> Jiufu Guo. > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)