On Fri, 7 Jun 2024 at 19:42, Stephen Face <shpf...@gmail.com> wrote:
>
> On 6/7/24 2:30 AM, Jonathan Wakely wrote:
> > On Fri, 7 Jun 2024 at 09:57, Sam James <s...@gentoo.org> wrote:
> >>
> >> Stephen Face <shpf...@gmail.com> writes:
> >>
> >>> This patch is to optimize the runtime execution of gcd. Mathematically,
> >>> it computes with the same algorithm as before, but subtractions and
> >>> branches are rearranged to encourage generation of code that can use
> >>> flags from the subtractions for conditional moves. Additionally, most
> >>> pairs of integers are coprime, so this patch also includes a check for
> >>> one of the integers to be equal to 1, and then it will exit the loop
> >>> early in this case.
> >>
> >> Is it worth filing a bug for the missed optimisation? You shouldn't have
> >> to write things in a specific order. Thanks.
>
> This patch is not a pure reordering. It has the added comparison with 1.
> Also, the argument to the trailing zero count is now the result of the
> subtraction before the comparison of m and n. This shortens the chain of
> dependencies, which can help the loop run faster.
>
> >
> > Yes, I think a bug report would be good. But 20%-60% decreases in run
> > time seems significant enough that we should take the libstdc++ patch
> > now, rather than wait for a possible compiler fix to come later.
> >
> > Stephen, could you please confirm whether you have a copyright
> > assignment in place for GCC, and if not whether you would be will to
> > complete one, or alternatively contribute this under the DCO terms:
> > https://gcc.gnu.org/dco.html
> > Thanks!
>
> I do not have a copyright assignment in place for GCC. I can certify to
> the DCO terms for this contribution.
>
>     Signed-off-by: Stephen Face <shpf...@gmail.com>

Thank you! I'll take care of this patch next week.


>
> >
> >
> >>
> >>>
> >>> libstdc++-v3/ChangeLog:
> >>>
> >>>       * include/std/numeric(__gcd): Optimize.
> >>> ---
> >>> I have tested this on x86_64-linux and aarch64-linux. I have tested the
> >>> timing with random distributions of small inputs and large inputs on a
> >>> couple of machines with -O2 and found decreases in execution time from
> >>> 20% to 60% depending on the machine and distribution of inputs.
> >>>
> >>>  libstdc++-v3/include/std/numeric | 21 +++++++++++----------
> >>>  1 file changed, 11 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/libstdc++-v3/include/std/numeric 
> >>> b/libstdc++-v3/include/std/numeric
> >>> index c912db4a519..3c9e8387a0e 100644
> >>> --- a/libstdc++-v3/include/std/numeric
> >>> +++ b/libstdc++-v3/include/std/numeric
> >>> @@ -148,19 +148,20 @@ namespace __detail
> >>>
> >>>        while (true)
> >>>       {
> >>> -       if (__m > __n)
> >>> -         {
> >>> -           _Tp __tmp = __m;
> >>> -           __m = __n;
> >>> -           __n = __tmp;
> >>> -         }
> >>> +       _Tp __m_minus_n = __m - __n;
> >>> +       if (__m_minus_n == 0)
> >>> +         return __m << __k;
> >>>
> >>> -       __n -= __m;
> >>> +       _Tp __next_m = __m < __n ? __m : __n;
> >>>
> >>> -       if (__n == 0)
> >>> -         return __m << __k;
> >>> +       if (__next_m == 1)
> >>> +         return __next_m << __k;
> >>> +
> >>> +       _Tp __n_minus_m = __n - __m;
> >>> +       __n = __n < __m ? __m_minus_n : __n_minus_m;
> >>> +       __m = __next_m;
> >>>
> >>> -       __n >>= std::__countr_zero(__n);
> >>> +       __n >>= std::__countr_zero(__m_minus_n);
> >>>       }
> >>>      }
> >>>  } // namespace __detail
> >>
> >
>

Reply via email to