https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492

--- Comment #9 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Ok. After investigation of LLVM:

Before loop vectorizer:

  %cond12 = tail call i32 @llvm.usub.sat.i32(i32 %conv5, i32 %wsize)
  %conv13 = trunc i32 %cond12 to i16

After loop vectorizer:

  %10 = call <16 x i32> @llvm.usub.sat.v16i32(<16 x i32> %9, <16 x i32>
%broadcast.splat)
  %11 = trunc <16 x i32> %10 to <16 x i16>

I think GCC can follow this approach, that is, first recognize scalar
saturation,
then fall into loop vectorizer to vectorize it into the saturation.

Reply via email to