https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117012
Bug ID: 117012 Summary: [15 Regression] incorrect RTL simplification around vector AND and shifts Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: tnfchris at gcc dot gnu.org Target Milestone: --- Target: aarch64* The following example: #include <arm_neon.h> #include <stdint.h> uint8x16_t f (uint8x16_t x) { uint8x16_t mask = vreinterpretq_u8_u64(vdupq_n_u64 (0x101)); return vandq_u8(vcltq_s8(vreinterpretq_s8_u8(x), vdupq_n_s8(0)), mask); } compiled at -O3 gives the following: f: ushr v0.16b, v0.16b, 7 ret This is incorrect as it assumes that the value in every lane for the AND was 0x1 where in fact only the bottom lane is. combine is matching this incorrect pattern: Trying 7, 6 -> 8: 7: r108:V16QI=const_vector 6: r107:V16QI=r109:V16QI>>const_vector REG_DEAD r109:V16QI 8: r106:V16QI=r107:V16QI&r108:V16QI REG_DEAD r108:V16QI REG_DEAD r107:V16QI REG_EQUAL r107:V16QI&const_vector Successfully matched this instruction: (set (reg:V16QI 106 [ _5 ]) (lshiftrt:V16QI (reg:V16QI 109 [ xD.22802 ]) (const_vector:V16QI [ (const_int 7 [0x7]) repeated x16 ]))) The optimization seems to only look at the bottom lane of the vector: #include <arm_neon.h> #include <stdint.h> uint8x16_t f (uint8x16_t x) { uint8x16_t mask = vreinterpretq_u8_u64(vdupq_n_u64 (0x301)); return vandq_u8(vcltq_s8(vreinterpretq_s8_u8(x), vdupq_n_s8(0)), mask); } also generates incorrect code but changing the bottom lane #include <arm_neon.h> #include <stdint.h> uint8x16_t f (uint8x16_t x) { uint8x16_t mask = vreinterpretq_u8_u64(vdupq_n_u64 (0x102)); return vandq_u8(vcltq_s8(vreinterpretq_s8_u8(x), vdupq_n_s8(0)), mask); } gives the right result.