https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86209
Ramana Radhakrishnan <ramana at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ramana at gcc dot gnu.org --- Comment #1 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> --- (In reply to sameerad from comment #0) > While implementing peephole2 for combining shorter types loads/stores into > larger type load/store, following testcase was found for aarch64 for which > peephole does not happen because the type of zero/sign extended operands is > not the same. > > Test program: > unsigned short > subus (unsigned short *array) > { > return array[0] + array[1]; > } > > Expander generated RTL: > (insn 6 3 7 2 (set (reg:HI 96) > (mem:HI (reg/v/f:DI 94 [ array ]) [1 *array_4(D)+0 S2 A16])) > (nil)) > (insn 7 6 8 2 (set (reg:HI 97) > (mem:HI (plus:DI (reg/v/f:DI 94 [ array ]) > (const_int 2 [0x2])) [1 MEM[(short unsigned int *)array_4(D) > + 2B]+0 S2 A16])) > (nil)) > (insn 8 7 9 2 (set (reg:SI 99) > (subreg:SI (reg:HI 97) 0)) > (nil)) > (insn 9 8 10 2 (set (reg:SI 98) > (plus:SI (subreg:SI (reg:HI 96) 0) > (reg:SI 99))) > (expr_list:REG_EQUAL (plus:SI (subreg:SI (reg:HI 96) 0) > (subreg:SI (reg:HI 97) 0)) > (nil))) > > The combiner combines insn 7 and 8 to generate zero extension to SI mode. > > (insn 8 7 9 2 (set (reg:SI 99 [ MEM[(short unsigned int *)array_4(D) + 2B] ]) > (zero_extend:SI (mem:HI (plus:DI (reg/v/f:DI 94 [ array ]) > (const_int 2 [0x2])) [1 MEM[(short unsigned int > *)array_4(D) + 2B]+0 S2 A16]))) {*zero_extendhisi2_aarch64} > (expr_list:REG_DEAD (reg/v/f:DI 94 [ array ]) > (nil))) > > The reload pass removes SUBREGs, which holds information about desired > type, because of which HImode regs are zero extended to DImode. > > (insn 8 7 6 2 (set (reg:SI 1 x1 [orig:99 MEM[(short unsigned int > *)array_4(D) + 2B] ] [99]) > (zero_extend:SI (mem:HI (plus:DI (reg/v/f:DI 0 x0 [orig:94 array ] > [94]) > (const_int 2 [0x2])) [1 MEM[(short unsigned int > *)array_4(D) + 2B]+0 S2 A16]))) {*zero_extendhisi2_aarch64} > (nil)) > (insn 6 8 9 2 (set (reg:DI 0 x0) > (zero_extend:DI (mem:HI (reg/v/f:DI 0 x0 [orig:94 array ] [94]) [1 > *array_4(D)+0 S2 A16]))) {*zero_extendhidi2_aarch64} > (nil)) > (insn 9 6 14 2 (set (reg:SI 0 x0 [98]) > (plus:SI (reg:SI 0 x0 [orig:96 *array_4(D) ] [96]) > (reg:SI 1 x1 [orig:99 MEM[(short unsigned int *)array_4(D) + 2B] > ] [99]))){*addsi3_aarch64} > (nil)) > (insn 14 9 15 2 (set (reg/i:HI 0 x0) > (reg:HI 0 x0 [98])) {*movhi_aarch64} > (nil)) > (insn 15 14 17 2 (use (reg/i:HI 0 x0)) > (nil)) > (note 17 15 18 NOTE_INSN_DELETED) > (note 18 17 0 NOTE_INSN_DELETED) > > Now as both memory accesses have different extended types, they cannot be > combined by peephole. > > Because of this, even when sched_fusion has brought the loads/stores closer, > they cannot be merged. Hmmm, ldr w0, [x0] ldr w1, [x0, 2] is not the same as ldp w0, w1, [x0] ldp w0, w1, [x0] is the same as merging ldr w0, [x0] ldr w1, [x0, 4] Am I missing something ? That would mean it isn't possible to merge this combination. Thoughts ...