https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86209

Ramana Radhakrishnan <ramana at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ramana at gcc dot gnu.org

--- Comment #1 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> ---
(In reply to sameerad from comment #0)
> While implementing peephole2 for combining shorter types loads/stores into
> larger type load/store, following testcase was found for aarch64 for which
> peephole does not happen because the type of zero/sign extended operands is
> not the same.
> 
> Test program:
> unsigned short
> subus (unsigned short *array)
> {
>   return array[0] + array[1];
> }
> 
> Expander generated RTL:
> (insn 6 3 7 2 (set (reg:HI 96)
>         (mem:HI (reg/v/f:DI 94 [ array ]) [1 *array_4(D)+0 S2 A16]))
>      (nil))
> (insn 7 6 8 2 (set (reg:HI 97)
>         (mem:HI (plus:DI (reg/v/f:DI 94 [ array ])
>                 (const_int 2 [0x2])) [1 MEM[(short unsigned int *)array_4(D)
> + 2B]+0 S2 A16]))
>      (nil))
> (insn 8 7 9 2 (set (reg:SI 99)
>         (subreg:SI (reg:HI 97) 0))
>      (nil))
> (insn 9 8 10 2 (set (reg:SI 98)
>         (plus:SI (subreg:SI (reg:HI 96) 0)
>             (reg:SI 99)))
>      (expr_list:REG_EQUAL (plus:SI (subreg:SI (reg:HI 96) 0)
>             (subreg:SI (reg:HI 97) 0))
>         (nil)))
> 
> The combiner combines insn 7 and 8 to generate zero extension to SI mode.
>  
> (insn 8 7 9 2 (set (reg:SI 99 [ MEM[(short unsigned int *)array_4(D) + 2B] ])
>         (zero_extend:SI (mem:HI (plus:DI (reg/v/f:DI 94 [ array ])
>                     (const_int 2 [0x2])) [1 MEM[(short unsigned int
> *)array_4(D) + 2B]+0 S2 A16]))) {*zero_extendhisi2_aarch64}
>      (expr_list:REG_DEAD (reg/v/f:DI 94 [ array ])
>         (nil)))
> 
>  The reload pass removes SUBREGs, which holds information about desired
> type, because of which HImode regs are zero extended to DImode.
> 
> (insn 8 7 6 2 (set (reg:SI 1 x1 [orig:99 MEM[(short unsigned int
> *)array_4(D) + 2B] ] [99])
>         (zero_extend:SI (mem:HI (plus:DI (reg/v/f:DI 0 x0 [orig:94 array ]
> [94])
>                     (const_int 2 [0x2])) [1 MEM[(short unsigned int
> *)array_4(D) + 2B]+0 S2 A16]))) {*zero_extendhisi2_aarch64}
>      (nil))
> (insn 6 8 9 2 (set (reg:DI 0 x0)
>         (zero_extend:DI (mem:HI (reg/v/f:DI 0 x0 [orig:94 array ] [94]) [1
> *array_4(D)+0 S2 A16]))) {*zero_extendhidi2_aarch64}
>      (nil))
> (insn 9 6 14 2 (set (reg:SI 0 x0 [98])
>         (plus:SI (reg:SI 0 x0 [orig:96 *array_4(D) ] [96])
>             (reg:SI 1 x1 [orig:99 MEM[(short unsigned int *)array_4(D) + 2B]
> ] [99]))){*addsi3_aarch64}
>      (nil))
> (insn 14 9 15 2 (set (reg/i:HI 0 x0)
>         (reg:HI 0 x0 [98])) {*movhi_aarch64}
>      (nil))
> (insn 15 14 17 2 (use (reg/i:HI 0 x0)) 
>      (nil))
> (note 17 15 18 NOTE_INSN_DELETED)
> (note 18 17 0 NOTE_INSN_DELETED)
> 
> Now as both memory accesses have different extended types, they cannot be
> combined by peephole.
> 
> Because of this, even when sched_fusion has brought the loads/stores closer,
> they cannot be merged.

Hmmm,

ldr w0, [x0]
ldr w1, [x0, 2]

is not the same as 

ldp w0, w1, [x0]

ldp w0, w1, [x0] is the same as merging

ldr w0, [x0]
ldr w1, [x0, 4]

Am I missing something ? That would mean it isn't possible to merge this
combination. 

Thoughts ...

Reply via email to