https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120378
Dusan Stojkovic <dusan.stojko...@rt-rk.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dusan.stojko...@rt-rk.com --- Comment #6 from Dusan Stojkovic <dusan.stojko...@rt-rk.com> --- Here are some examples from inter_pred_filter functions in x265: https://godbolt.org/z/7asjsqcj8 They are generalized to show different variations to consider. The examples show how introducing a temporary variable before storing the result of the clipping produces: ``` vmsge.vi v0,v1,0 vsetvli zero,zero,e8,mf2,ta,mu vnsrl.wi v6,v1,0,v0.t vsetvli zero,zero,e16,m1,ta,ma vmsle.vv v0,v1,v5 vsetvli zero,zero,e8,mf2,ta,ma vmerge.vvm v1,v3,v6,v0 ``` Shouldn't both cases generate the vmax/vmin/truncate pattern at least? Curiously, when doing a signed clip, there are two different approaches taken by GCC; this time the choice doesn't involve the type of store, but rather the size difference between the type being clipped and the resulting type. There is a case where at the end with two functions which could be optimized with: ``` ... csrwi vxrm,0 ... vnclipu.wi v1,v1,6 ... ``` Here GCC chooses vmax/vmin/truncate regardless of introducing a temporary variable or not.