On Fri, Nov 18, 2016 at 5:57 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Fri, Nov 18, 2016 at 4:52 PM, Michael Matz <m...@suse.de> wrote: >> Hi, >> >> On Thu, 17 Nov 2016, Bin.Cheng wrote: >> >>> B) Depending on ilp, I think below test strings fail for long time with >>> haswell: >>> ! { dg-final { scan-tree-dump-times "Executing predictive commoning >>> without unrolling" 1 "pcom" { target lp64 } } } >>> ! { dg-final { scan-tree-dump-times "Executing predictive commoning >>> without unrolling" 2 "pcom" { target ia32 } } } >>> Because vectorizer choose vf==4 in this case, and there is no >>> predictive commoning opportunities at all. >>> Also the newly added test string fails in this case too because the >>> prolog peeled iterates more than 1 times. >> >> Btw, this probably means that on haswell (or other archs with vf==4) mgrid >> is slower than necessary. On mgrid you really really want predictive >> commoning to happen. Vectorization isn't that interesting there. > Interesting, I will check if there is difference between 2/4 vf. we > do have cases that smaller vf is better and should be chosen, though > for different reasons.
At some time in the past we had predictive commoning done before vectorization (GCC 4.3 at least). Patch is ok meanwhile. Richard. > Thanks, > bin >> >> >> Ciao, >> Michael.