Re: [PATCH] RISC-V: Enable overlap-by-pieces in case of fast unaliged access

Andrew Pinski via Gcc-patches Mon, 16 Aug 2021 11:56:30 -0700

On Mon, Aug 16, 2021 at 10:10 AM Palmer Dabbelt <pal...@dabbelt.com> wrote:
>
> On Mon, 16 Aug 2021 09:29:16 PDT (-0700), Kito Cheng wrote:
> >> > Could you submit v3 patch which is v1 with overlap_op_by_pieces field,
> >> > testcase from v2 and add a few more comments to describe the field?
> >> >
> >> > And add an -mtune=ultra-size to make it able to test without change
> >> > other behavior?
> >> >
> >> > Hi Palmer:
> >> >
> >> > Are you OK with that?
> >>
> >> I'm still not convinced on the performance: like Andrew and I pointed
> >> out, this is a difficult case for pipelines of this flavor to handle.
> >> Nobody here knows anything about this pipeline deeply enough to say
> >> anything difinitive, though, so this is really just a guess.
> >
> > So with an extra field to indicate should resolve that?
> > I believe people should only set overlap_op_by_pieces
> > to true only if they are sure it has benefits.
>
> My only issue there is that we'd have no way to turn it on, but see
> below...
>
> >> As I'm not convinced this is an obvious performance win I'm not going to
> >> merge it without a benchmark.  If you're convinced and want to merge it
> >> that's fine, I don't really care about the performance fo the C906 and
> >> if someone complains we can always just revert it later.
> >
> > I suppose Christoph has tried with their internal processor, and it's
> > benefit on performance,
> > but it can't be open-source yet, so v2 patch set using C906 to demo
> > and test that since that is
> > the only processor with slow_unaligned_access=False.
>
> Well, that's a very different discussion.  The C906 tuning model should
> be for the C906, not a proxy for some internal-only processor.  If the
> goal here is to allow this pass to be flipped on by an out-of-tree
> pipeline model then we can talk about it.
>
> > I agree on the C906 part, we never know it's benefit or not, so I propose
> > adding one -mtune=ultra-size to make this test-able rather than changing 
> > C906.
>
> That's essentially the same conclusion we came to last time this came
> up, except that we were calling it "-Oz" (because LLVM does).  I guess
> we never got around having the broader GCC discussion about "-Oz".  IIRC
> we had some other "-Oz" candidates we never got around to dealing with,
> but that was a while ago so I'm not sure if any of that panned out.


-Oz was a bad idea that Apple came up because GCC decided to start
emitting store multiple on PowerPC around 13 years ago.
I don't think we should repeat that mistake for GCC and especially for RISCV.
If people want to optimize for size, they get the performance issues.

Thanks,
Andrew Pinski

Re: [PATCH] RISC-V: Enable overlap-by-pieces in case of fast unaliged access

Reply via email to