lgtm.







 ----------Reply to Message----------
 On Wed, Jan 22, 2025 16:04 PM Robin Dapp<rdapp....@gmail.com&gt; wrote:

  &gt; Could you show me the a piece of&nbsp; codegen difference in X264 that 
make performance improved ?

I have one ready from SATD (see PR117173), there are more.

"Before":
_838 = VEC_PERM_EXPR <vect__49.83_41, vect__52.84_40, { 1, 11, 1, 11, 5, 15, 5, 
15 }&gt;;
_846 = VEC_PERM_EXPR <vect__49.83_41, vect__52.84_40, { 0, 10, 0, 10, 4, 14, 4, 
14 }&gt;;

"After":
_42 = VEC_PERM_EXPR <vect__49.83_41, vect__52.84_40, { 0, 1, 10, 11, 4, 5, 14, 
15 }&gt;;
...
_44 = VEC_PERM_EXPR <vect_t0_114.85_43, vect_t0_114.85_43, { 1, 3, 1, 3, 5, 7, 
5, 7 }&gt;;
_45 = VEC_PERM_EXPR <vect_t0_114.85_43, vect_t0_114.85_43, { 0, 2, 0, 2, 4, 6, 
4, 6 }&gt;;

"After" is matched in match.pd and converted to "before".&nbsp; "Before" 
requires
two masked, complex gathers while "after" needs to masking but just vmerge
and single-source gathers.

Reply via email to