On 6/29/23 01:39, Christoph Müllner wrote:
On Wed, Jun 28, 2023 at 8:23 PM Jeff Law <jeffreya...@gmail.com> wrote:
On 6/28/23 06:39, Christoph Müllner wrote:
+;; XTheadMemIdx overview:
+;; All peephole passes attempt to improve the operand utilization of
+;; XTheadMemIdx instructions, where one sign or zero extended
+;; register-index-operand can be shifted left by a 2-bit immediate.
+;;
+;; The basic idea is the following optimization:
+;; (set (reg 0) (op (reg 1) (imm 2)))
+;; (set (reg 3) (mem (plus (reg 0) (reg 4)))
+;; ==>
+;; (set (reg 3) (mem (plus (reg 4) (op2 (reg 1) (imm 2))))
+;; This optimization only valid if (reg 0) has no further uses.
Couldn't this be done by combine if you created define_insn patterns
rather than define_peephole2 patterns? Similarly for the other cases
handled here.
I was inspired by XTheadMemPair, which merges two memory accesses
into a mem-pair instruction (and which got inspiration from
gcc/config/aarch64/aarch64-ldpstp.md).
Right. I'm pretty familiar with those. They cover a different case,
specifically the two insns being optimized don't have a true data
dependency between them. ie, the first instruction does not produce a
result used in the second insn.
In the case above there is a data dependency on reg0. ie, the first
instruction generates a result used in the second instruction. combine
is usually the best place to handle the data dependency case.
Ok, understood.
It is a bit of a special case here, because the peephole is restricted
to those cases, where reg0 is not used elsewhere (peep2_reg_dead_p()).
I have not seen how to do this for combiner optimizations.
If the value is used elsewhere, then the combiner will generate a
parallel with two sets. If the value dies, then the combiner generates
the one set. ie given
(set (t) (op0 (a) (b)))
(set (r) (op1 (c) (t)))
If "t" is dead, then combine will present you with:
(set (r) (op1 (c) (op0 (a) (b))))
If "t" is used elsewhere, then combine will present you with:
(parallel
[(set (r) (op1 (c) (op0 (a) (b))))
(set (t) (op0 (a) (b)))])
Which makes perfect sense if you think about it for a while. If you
still need "t", then the first sequence simply isn't valid as it doesn't
preserve that side effect. Hence it tries to produce a sequence with
the combined operation, but with the side effect of the first statement
included as well.
Jeff