On Tue, 9 Feb 2021, Jakub Jelinek wrote:

> On Tue, Feb 09, 2021 at 12:52:55PM +0100, Richard Biener wrote:
> > Yeah, it does look useful in the end.  Note that you might want
> > to adjust ix86_add_stmt_cost (or ix86_shift_rotate_cost, that is)
> > to reflect the complex expansion.
> 
> Yeah, the patch does that, see the i386.c hunks.
> 
> I guess for V2DImode vectorization, it will usually be a win only if the
> lack of the optab support would cause much larger loop not to be vectorized,
> but for V4DImode the scalar cost won't be that small already.

Due to how we cost loads and stores I guess even V2DImode vectorization of

long di[2];
void foo ()
{
 di[0] >>= 7;
 di[1] >>= 7;
}

will be considered profitable (scalar and vector loads/stores cost 12 
compared to the shift which costs 4 so we have a budget of 24 from
vectorizing the load/store we can eat from to make the vector shift
profitable).

Richard.

Reply via email to