On Mon, Nov 23, 2015 at 4:52 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > Hi Richard, > > Did you have a chance to look at this?
It's on my list - I'm still swamped with patches to review. Richard. > Thanks. > Yuri. > > 2015-11-13 13:35 GMT+03:00 Yuri Rumyantsev <ysrum...@gmail.com>: >> Hi Richard, >> >> Here is updated version of the patch which 91) is in sync with trunk >> compiler and (2) contains simple cost model to estimate profitability >> of scalar epilogue elimination. The part related to vectorization of >> loops with small trip count is in process of developing. Note that >> implemented cost model was not tuned well for HASWELL and KNL but we >> got ~6% speed-up on 436.cactusADM from spec2006 suite for HASWELL. >> >> 2015-11-10 17:52 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>> On Tue, Nov 10, 2015 at 2:02 PM, Ilya Enkovich <enkovich....@gmail.com> >>> wrote: >>>> 2015-11-10 15:30 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>> On Tue, Nov 3, 2015 at 1:08 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>> wrote: >>>>>> Richard, >>>>>> >>>>>> It looks like misunderstanding - we assume that for GCCv6 the simple >>>>>> scheme of remainder will be used through introducing new IV : >>>>>> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01435.html >>>>>> >>>>>> Is it true or we missed something? >>>>> >>>>> <quote> >>>>>> > Do you have an idea how "masking" is better be organized to be usable >>>>>> > for both 4b and 4c? >>>>>> >>>>>> Do 2a ... >>>>> Okay. >>>>> </quote> >>>> >>>> 2a was 'transform already vectorized loop as a separate >>>> post-processing'. Isn't it what this prototype patch implements? >>>> Current version only masks loop body which is in practice applicable >>>> for AVX-512 only in the most cases. With AVX-512 it's easier to see >>>> how profitable masking might be and it is a main target for the first >>>> masking version. Extending it to prologues/epilogues and thus making >>>> it more profitable for other targets is the next step and is out of >>>> the scope of this patch. >>> >>> Ok, technically the prototype transforms the already vectorized loop. >>> Of course I meant the vectorized loop be copied, masked and that >>> result used as epilogue... >>> >>> I'll queue a more detailed look into the patch for this week. >>> >>> Did you perform any measurements with this patch like # of >>> masked epilogues in SPEC 2006 FP (and any speedup?) >>> >>> Thanks, >>> Richard. >>> >>>> Thanks, >>>> Ilya >>>> >>>>> >>>>> Richard. >>>>>