https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120378

Dusan Stojkovic <dusan.stojko...@rt-rk.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dusan.stojko...@rt-rk.com

--- Comment #6 from Dusan Stojkovic <dusan.stojko...@rt-rk.com> ---
Here are some examples from inter_pred_filter functions in x265:
https://godbolt.org/z/7asjsqcj8

They are generalized to show different variations to consider. The examples
show how introducing a temporary 
variable before storing the result of the clipping produces:
```
        vmsge.vi        v0,v1,0
        vsetvli zero,zero,e8,mf2,ta,mu
        vnsrl.wi        v6,v1,0,v0.t
        vsetvli zero,zero,e16,m1,ta,ma
        vmsle.vv        v0,v1,v5
        vsetvli zero,zero,e8,mf2,ta,ma
        vmerge.vvm      v1,v3,v6,v0
```

Shouldn't both cases generate the vmax/vmin/truncate pattern at least?

Curiously, when doing a signed clip, there are two different approaches taken
by GCC; this time the choice doesn't involve
the type of store, but rather the size difference between the type being
clipped and the resulting type.

There is a case where at the end with two functions which could be optimized
with:
```
        ...
        csrwi   vxrm,0
        ...
        vnclipu.wi      v1,v1,6
        ...
```
Here GCC chooses vmax/vmin/truncate regardless of introducing a temporary
variable or not.

Reply via email to