On Tue, Feb 09, 2021 at 12:52:55PM +0100, Richard Biener wrote: > Yeah, it does look useful in the end. Note that you might want > to adjust ix86_add_stmt_cost (or ix86_shift_rotate_cost, that is) > to reflect the complex expansion.
Yeah, the patch does that, see the i386.c hunks. I guess for V2DImode vectorization, it will usually be a win only if the lack of the optab support would cause much larger loop not to be vectorized, but for V4DImode the scalar cost won't be that small already. Jakub