On Sat, Feb 18, 2023 at 1:30 PM Palmer Dabbelt <pal...@dabbelt.com> wrote:
>
> On Sat, 18 Feb 2023 13:06:02 PST (-0800), jeffreya...@gmail.com wrote:
> >
> >
> > On 2/18/23 11:26, Palmer Dabbelt wrote:
> >> On Fri, 17 Feb 2023 06:02:40 PST (-0800), gcc-patches@gcc.gnu.org wrote:
> >>> Hi all,
> >>> If we have division and remainder calculations with the same operands:
> >>>
> >>>   a = b / c;
> >>>   d = b % c;
> >>>
> >>> We can replace the calculation of remainder with multiplication +
> >>> subtraction, using the result from the previous division:
> >>>
> >>>   a = b / c;
> >>>   d = a * c;
> >>>   d = b - d;
> >>>
> >>> Which will be faster.
> >>
> >> Do you have any benchmarks that show that performance increase?  The ISA
> >> manual specifically says the suggested sequence is div+mod, and while
> >> those suggestions don't always pan out for real hardware it's likely
> >> that at least some implementations will end up with the ISA-suggested
> >> fusions.
> > It'll almost certainly be visible in mcf.  Been there, done that.  In
> > fact, that's why I asked the team Matevos works on to poke at this case
> > as I went through this issue on another processor.
> >
> > It can also be run through LLVM's MCA to estimate counts if you've got a
> > pipeline description.  THe div+rem will come out at around ~40c while a
> > div+mul+sub should weigh in around 25c for Veyron v1.
>
> Do you have a link to the patches somewhere?  I couldn't find them
> online, just the custom instruction support.  Or even just some docs
> describing what the pipeline does, as just basing one performance model
> on another is kind of a double-edged sword.
>
> That said, I think just knowing the processor doesn't do the div+mod
> fusion is sufficient to turn something like this on for the mtune for
> that processor.  That's different than turning it on globally, though --
> unless it turns out nobody is actually doing the fusion suggested in the
> ISA manual, which wouldn't be super surprising.
>
> Maybe some of the SiFive and T-Head folks can chime in on whether or not
> their processors perform the fusion in question -- and if so, do the
> instructions need to say back-to-back?

AFAIK, the sequence with the multiplication will normally be faster on
SiFive cores when both the quotient and the remainder are needed.

>  It doesn't look like we're
> really targeting the code sequences the ISA suggests as it stands, so
> maybe it's OK to just switch the default over too?
>
> It also brings up the question of mulh+mul fusions, which I don't think
> we've really looked at (though maybe they're a lot less important for
> rv64).

Reply via email to