On Tue, 9 Feb 2021, Jakub Jelinek wrote: > On Tue, Feb 09, 2021 at 12:52:55PM +0100, Richard Biener wrote: > > Yeah, it does look useful in the end. Note that you might want > > to adjust ix86_add_stmt_cost (or ix86_shift_rotate_cost, that is) > > to reflect the complex expansion. > > Yeah, the patch does that, see the i386.c hunks. > > I guess for V2DImode vectorization, it will usually be a win only if the > lack of the optab support would cause much larger loop not to be vectorized, > but for V4DImode the scalar cost won't be that small already.
Due to how we cost loads and stores I guess even V2DImode vectorization of long di[2]; void foo () { di[0] >>= 7; di[1] >>= 7; } will be considered profitable (scalar and vector loads/stores cost 12 compared to the shift which costs 4 so we have a budget of 24 from vectorizing the load/store we can eat from to make the vector shift profitable). Richard.