On Tue, May 30, 2023 at 2:30 AM Jeff Law <jeffreya...@gmail.com> wrote: > > > > On 5/25/23 08:02, Manolis Tsamis wrote: > > On Thu, May 25, 2023 at 4:53 PM Richard Biener via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > >> > >> On Thu, May 25, 2023 at 3:32 PM Jeff Law via Gcc-patches > >> <gcc-patches@gcc.gnu.org> wrote: > >>> > >>> > >>> > >>> On 5/25/23 07:01, Richard Biener via Gcc-patches wrote: > >>>> On Thu, May 25, 2023 at 2:36 PM Manolis Tsamis <manolis.tsa...@vrull.eu> > >>>> wrote: > >>>>> > >>>>> Implementation of the new RISC-V optimization pass for memory offset > >>>>> calculations, documentation and testcases. > >>>> > >>>> Why do fwprop or combine not what you want to do? > >>> I think a lot of them end up coming from register elimination. > >> > >> Why isn't this a problem for other targets then? Or maybe it is and this > >> shouldn't be a machine specific pass? Maybe postreload-gcse should > >> perform strength reduction (I can't think of any other post reload pass > >> that would do something even remotely related). > >> > >> Richard. > >> > > > > It should be a problem for other targets as well (especially RISC-style > > ISAs). > > > > It can be easily seen by comparing the generated code for the > > testcases: Example for testcase-2 on AArch64: > > https://godbolt.org/z/GMT1K7Ebr > > Although the patterns in the test cases are the ones that are simple > > as the complex ones manifest in complex programs, the case still > > holds. > > The code for this pass is quite generic and could work for most/all > > targets if that would be interesting. > Interestly enough, fold-mem-offsets seems to interact strangely with the > load/store pair support on aarch64. Note show store2a uses 2 stp > instructions on the trunk, but 4 str instructions with fold-mem-offsets. > Yet in load1r we're able to generate a load-quad rather than two load > pairs. Weird. >
I'm confused, where is this comparison from? The fold-mem-offsets pass is only run on RISCV and doesn't (shouldn't) affect AArch64. I only see the 2x stp / 4x str in the godbolt link, but that is gcc vs clang, no fold-mem-offsets involved here. Manolis > jeff