On Tue, May 30, 2023 at 2:30 AM Jeff Law <jeffreya...@gmail.com> wrote:
>
>
>
> On 5/25/23 08:02, Manolis Tsamis wrote:
> > On Thu, May 25, 2023 at 4:53 PM Richard Biener via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >>
> >> On Thu, May 25, 2023 at 3:32 PM Jeff Law via Gcc-patches
> >> <gcc-patches@gcc.gnu.org> wrote:
> >>>
> >>>
> >>>
> >>> On 5/25/23 07:01, Richard Biener via Gcc-patches wrote:
> >>>> On Thu, May 25, 2023 at 2:36 PM Manolis Tsamis <manolis.tsa...@vrull.eu> 
> >>>> wrote:
> >>>>>
> >>>>> Implementation of the new RISC-V optimization pass for memory offset
> >>>>> calculations, documentation and testcases.
> >>>>
> >>>> Why do fwprop or combine not what you want to do?
> >>> I think a lot of them end up coming from register elimination.
> >>
> >> Why isn't this a problem for other targets then?  Or maybe it is and this
> >> shouldn't be a machine specific pass?  Maybe postreload-gcse should
> >> perform strength reduction (I can't think of any other post reload pass
> >> that would do something even remotely related).
> >>
> >> Richard.
> >>
> >
> > It should be a problem for other targets as well (especially RISC-style 
> > ISAs).
> >
> > It can be easily seen by comparing the generated code for the
> > testcases: Example for testcase-2 on AArch64:
> > https://godbolt.org/z/GMT1K7Ebr
> > Although the patterns in the test cases are the ones that are simple
> > as the complex ones manifest in complex programs, the case still
> > holds.
> > The code for this pass is quite generic and could work for most/all
> > targets if that would be interesting.
> Interestly enough, fold-mem-offsets seems to interact strangely with the
> load/store pair support on aarch64.  Note show store2a uses 2 stp
> instructions on the trunk, but 4 str instructions with fold-mem-offsets.
>   Yet in load1r we're able to generate a load-quad rather than two load
> pairs.  Weird.
>

I'm confused, where is this comparison from?
The fold-mem-offsets pass is only run on RISCV and doesn't (shouldn't)
affect AArch64.

I only see the 2x stp / 4x str in the godbolt link, but that is gcc vs
clang, no fold-mem-offsets involved here.

Manolis

> jeff

Reply via email to