On Thu, Jun 29, 2023 at 4:09 PM Jeff Law <jeffreya...@gmail.com> wrote:
>
>
>
> On 6/29/23 01:39, Christoph Müllner wrote:
> > On Wed, Jun 28, 2023 at 8:23 PM Jeff Law <jeffreya...@gmail.com> wrote:
> >>
> >>
> >>
> >> On 6/28/23 06:39, Christoph Müllner wrote:
> >>
> >>>>> +;; XTheadMemIdx overview:
> >>>>> +;; All peephole passes attempt to improve the operand utilization of
> >>>>> +;; XTheadMemIdx instructions, where one sign or zero extended
> >>>>> +;; register-index-operand can be shifted left by a 2-bit immediate.
> >>>>> +;;
> >>>>> +;; The basic idea is the following optimization:
> >>>>> +;; (set (reg 0) (op (reg 1) (imm 2)))
> >>>>> +;; (set (reg 3) (mem (plus (reg 0) (reg 4)))
> >>>>> +;; ==>
> >>>>> +;; (set (reg 3) (mem (plus (reg 4) (op2 (reg 1) (imm 2))))
> >>>>> +;; This optimization only valid if (reg 0) has no further uses.
> >>>> Couldn't this be done by combine if you created define_insn patterns
> >>>> rather than define_peephole2 patterns?  Similarly for the other cases
> >>>> handled here.
> >>>
> >>> I was inspired by XTheadMemPair, which merges two memory accesses
> >>> into a mem-pair instruction (and which got inspiration from
> >>> gcc/config/aarch64/aarch64-ldpstp.md).
> >> Right.  I'm pretty familiar with those.  They cover a different case,
> >> specifically the two insns being optimized don't have a true data
> >> dependency between them.  ie, the first instruction does not produce a
> >> result used in the second insn.
> >>
> >>
> >> In the case above there is a data dependency on reg0.  ie, the first
> >> instruction generates a result used in the second instruction.  combine
> >> is usually the best place to handle the data dependency case.
> >
> > Ok, understood.
> >
> > It is a bit of a special case here, because the peephole is restricted
> > to those cases, where reg0 is not used elsewhere (peep2_reg_dead_p()).
> > I have not seen how to do this for combiner optimizations.
> If the value is used elsewhere, then the combiner will generate a
> parallel with two sets.  If the value dies, then the combiner generates
> the one set.  ie given
>
> (set (t) (op0 (a) (b)))
> (set (r) (op1 (c) (t)))
>
> If "t" is dead, then combine will present you with:
>
> (set (r) (op1 (c) (op0 (a) (b))))
>
> If "t" is used elsewhere, then combine will present you with:
>
> (parallel
>    [(set (r) (op1 (c) (op0 (a) (b))))
>     (set (t) (op0 (a) (b)))])
>
> Which makes perfect sense if you think about it for a while.  If you
> still need "t", then the first sequence simply isn't valid as it doesn't
> preserve that side effect.  Hence it tries to produce a sequence with
> the combined operation, but with the side effect of the first statement
> included as well.

Thanks for this!
Of course I was "lucky" and ran into the issue that the patterns did not match,
because of unexpected MULT insns where ASHIFTs were expected.
But after reading enough of combiner.cc I understood that this is on purpose
(for addresses) and I have to adjust my INSNs accordingly.

I've changed the patches for XTheadMemIdx and XTheadFMemIdx and will
send out a new series.

Thanks,
Christoph

Reply via email to