https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492
--- Comment #9 from JuzheZhong <juzhe.zhong at rivai dot ai> --- Ok. After investigation of LLVM: Before loop vectorizer: %cond12 = tail call i32 @llvm.usub.sat.i32(i32 %conv5, i32 %wsize) %conv13 = trunc i32 %cond12 to i16 After loop vectorizer: %10 = call <16 x i32> @llvm.usub.sat.v16i32(<16 x i32> %9, <16 x i32> %broadcast.splat) %11 = trunc <16 x i32> %10 to <16 x i16> I think GCC can follow this approach, that is, first recognize scalar saturation, then fall into loop vectorizer to vectorize it into the saturation.