on 2020/3/18 下午6:40, Richard Biener wrote: > On Wed, Mar 18, 2020 at 11:39 AM Richard Biener > <richard.guent...@gmail.com> wrote: >> >> On Wed, Mar 18, 2020 at 11:06 AM Kewen.Lin <li...@linux.ibm.com> wrote: >>> >>> Hi, >>> >>> As PR90332 shows, the current scalar epilogue peeling for gaps >>> elimination requires expected vec_init optab with two half size >>> vector mode. On Power, we don't support vector mode like V8QI, >>> so can't support optab like vec_initv16qiv8qi. But we want to >>> leverage existing scalar mode like DI to init the desirable >>> vector mode. This patch is to extend the existing support for >>> Power, as evaluated on Power9 we can see expected 1.9% speed up >>> on SPEC2017 525.x264_r. >>> >>> Bootstrapped/regtested on powerpc64le-linux-gnu (LE) P8 and P9. >>> >>> Is it ok for trunk? >> >> There's already code exercising such a case in vectorizable_load >> (VMAT_STRIDED_SLP) which you could have factored out. >> >> vectype, bool slp, >> than the alignment boundary B. Every vector access will >> be a multiple of B and so we are guaranteed to access a >> non-gap element in the same B-sized block. */ >> + machine_mode half_mode; >> if (overrun_p >> && gap < (vect_known_alignment_in_bytes (first_dr_info) >> / vect_get_scalar_dr_size (first_dr_info))) >> - overrun_p = false; >> - >> + { >> + overrun_p = false; >> + if (known_eq (nunits, (group_size - gap) * 2) >> + && known_eq (nunits, group_size) >> + && get_half_mode_for_vector (vectype, &half_mode)) >> + DR_GROUP_HALF_MODE (first_stmt_info) = half_mode; >> + } >> >> why do you need to amend this case? >> >> I don't like storing DR_GROUP_HALF_MODE very much, later >> you need a vector type and it looks cheap enough to recompute >> it where you need it? Iff then it doesn't belong to DR_GROUP >> but to the stmt-info. >> >> I realize the original optimization was kind of a hack (and I was too >> lazy to implement the integer mode construction path ...). >> >> So, can you factor out the existing code into a function returning >> the vector type for construction for a vector type and a >> pieces size? So for V16QI and a pieces-size of 4 we'd >> get either V16QI back (then construction from V4QI pieces >> should work) or V4SI (then construction from SImode pieces >> should work)? Eventually as secondary output provide that >> piece type (SI / V4QI). > > Btw, why not implement the neccessary vector init patterns? >
Power doesn't support 64bit vector size, it looks a bit hacky and confusing to introduce this kind of mode just for some optab requirement, but I admit the optab hack can immediately make it work. :) BR, Kewen