On Thu, Jun 29, 2023 at 4:09 PM Jeff Law <jeffreya...@gmail.com> wrote: > > > > On 6/29/23 01:39, Christoph Müllner wrote: > > On Wed, Jun 28, 2023 at 8:23 PM Jeff Law <jeffreya...@gmail.com> wrote: > >> > >> > >> > >> On 6/28/23 06:39, Christoph Müllner wrote: > >> > >>>>> +;; XTheadMemIdx overview: > >>>>> +;; All peephole passes attempt to improve the operand utilization of > >>>>> +;; XTheadMemIdx instructions, where one sign or zero extended > >>>>> +;; register-index-operand can be shifted left by a 2-bit immediate. > >>>>> +;; > >>>>> +;; The basic idea is the following optimization: > >>>>> +;; (set (reg 0) (op (reg 1) (imm 2))) > >>>>> +;; (set (reg 3) (mem (plus (reg 0) (reg 4))) > >>>>> +;; ==> > >>>>> +;; (set (reg 3) (mem (plus (reg 4) (op2 (reg 1) (imm 2)))) > >>>>> +;; This optimization only valid if (reg 0) has no further uses. > >>>> Couldn't this be done by combine if you created define_insn patterns > >>>> rather than define_peephole2 patterns? Similarly for the other cases > >>>> handled here. > >>> > >>> I was inspired by XTheadMemPair, which merges two memory accesses > >>> into a mem-pair instruction (and which got inspiration from > >>> gcc/config/aarch64/aarch64-ldpstp.md). > >> Right. I'm pretty familiar with those. They cover a different case, > >> specifically the two insns being optimized don't have a true data > >> dependency between them. ie, the first instruction does not produce a > >> result used in the second insn. > >> > >> > >> In the case above there is a data dependency on reg0. ie, the first > >> instruction generates a result used in the second instruction. combine > >> is usually the best place to handle the data dependency case. > > > > Ok, understood. > > > > It is a bit of a special case here, because the peephole is restricted > > to those cases, where reg0 is not used elsewhere (peep2_reg_dead_p()). > > I have not seen how to do this for combiner optimizations. > If the value is used elsewhere, then the combiner will generate a > parallel with two sets. If the value dies, then the combiner generates > the one set. ie given > > (set (t) (op0 (a) (b))) > (set (r) (op1 (c) (t))) > > If "t" is dead, then combine will present you with: > > (set (r) (op1 (c) (op0 (a) (b)))) > > If "t" is used elsewhere, then combine will present you with: > > (parallel > [(set (r) (op1 (c) (op0 (a) (b)))) > (set (t) (op0 (a) (b)))]) > > Which makes perfect sense if you think about it for a while. If you > still need "t", then the first sequence simply isn't valid as it doesn't > preserve that side effect. Hence it tries to produce a sequence with > the combined operation, but with the side effect of the first statement > included as well.
Thanks for this! Of course I was "lucky" and ran into the issue that the patterns did not match, because of unexpected MULT insns where ASHIFTs were expected. But after reading enough of combiner.cc I understood that this is on purpose (for addresses) and I have to adjust my INSNs accordingly. I've changed the patches for XTheadMemIdx and XTheadFMemIdx and will send out a new series. Thanks, Christoph