https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118622
Bug ID: 118622 Summary: vshrn_n_u16 with a vmvnq_u16 should produce the same code as a vsubhn_u16 with -1 Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Target: aarch64 Take: ``` uint8x8_t neg_narrow(uint16x8_t a) { uint16x8_t b = vmvnq_u16(a); return vshrn_n_u16(b, 8); } uint8x8_t neg_narrow_vsubhn(uint16x8_t a) { uint16x8_t ones = vdupq_n_u16(0xffff); return vsubhn_u16(ones, a); } ``` GCC should produce the same code for both of these functions. For neg_narrow in combine we get: Trying 6 -> 7: 6: r104:V8HI=~r106:V8HI REG_DEAD r106:V8HI 7: r102:V8QI=trunc(r104:V8HI 0>>const_vector) REG_DEAD r104:V8HI Failed to match this instruction: (set (reg:V8QI 102 [ <retval> ]) (truncate:V8QI (lshiftrt:V8HI (not:V8HI (reg:V8HI 106 [ a ])) (const_vector:V8HI [ (const_int 8 [0x8]) repeated x8 ])))) Which is ok and then in neg_narrow_vsubhn (in combine) we get: Trying 6 -> 7: 6: r103:V8HI=const_vector 7: r101:V8QI=trunc(r103:V8HI-r105:V8HI>>const_vector) REG_DEAD r105:V8HI REG_DEAD r103:V8HI Failed to match this instruction: (set (reg:V8QI 101 [ <retval> ]) (truncate:V8QI (ashiftrt:V8HI (not:V8HI (reg:V8HI 105 [ a ])) (const_vector:V8HI [ (const_int 8 [0x8]) repeated x8 ])))) Notice the only difference is lshiftrt vs ashiftrt. but with the truncate, we are getting the high part of the register so logical vs arithmetic shift does not matter here. So we should match both and then turn that split it back into what the original IR for neg_narrow_vsubhn was. I should note that LLVM Canonicalizes this to neg_narrow but neg_narrow_vsubhn can be faster in some (all?) cases.