https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97249
Bug ID: 97249 Summary: Missing vec_select and subreg optimization Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com CC: hjl.tools at gmail dot com, wwwhhhyyy333 at gmail dot com Target Milestone: --- Host: x86_64-pc-linux-gnu Cat test.c --- void foo (unsigned char* p1, unsigned char* p2, short* __restrict p3) { for (int i = 0 ; i != 8; i++) p3[i] = p1[i] + p2[i]; return; } --- gcc11 -Ofast -mavx2 test.c got --- foo: .LFB0: .cfi_startproc vmovq (%rdi), %xmm0 vmovq (%rsi), %xmm1 vpmovzxbw %xmm0, %xmm0 vpmovzxbw %xmm1, %xmm1 vpaddw %xmm1, %xmm0, %xmm0 vmovdqu %xmm0, (%rdx) ret .cfi_endproc --- memory operand doesn't propagate into *vpmovzxbw* because rtl didn't simplify --- (insn 9 8 10 2 (set (reg:V8HI 92 [ vect__33.6 ]) (zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V8QI 91 [ vect__40.5 ]) 0) (parallel [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 2 [0x2]) (const_int 3 [0x3]) (const_int 4 [0x4]) (const_int 5 [0x5]) (const_int 6 [0x6]) (const_int 7 [0x7]) ])))) "test.c":5:16 4638 {sse4_1_zero_extendv8qiv8hi2} (expr_list:REG_DEAD (reg:V8QI 91 [ vect__40.5 ]) (nil))) --- to --- (insn 9 8 10 2 (set (reg:V8HI 92 [ vect__33.6 ]) (zero_extend:V8HI (reg:V8QI 91 [ vect__40.5 ])))) "test.c":5:16 4638 {sse4_1_zero_extendv8qiv8hi2} (expr_list:REG_DEAD (reg:V8QI 91 [ vect__40.5 ]) (nil))) --- Similar for other vector modes.