memset inline strategies for -mtune=generic

H.J. Lu via Gcc-patches Mon, 22 Mar 2021 06:39:24 -0700

On Mon, Mar 22, 2021 at 6:29 AM Richard Biener
<richard.guent...@gmail.com> wrote:
>
> On Mon, Mar 22, 2021 at 2:19 PM H.J. Lu via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Simply memcpy and memset inline strategies to avoid branches for
> > -mtune=generic:
> >
> > 1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector
> >    load and store for up to 16 * 16 (256) bytes when the data size is
> >    fixed and known.
> > 2. Inline only if data size is known to be <= 256.
> >    a. Use "rep movsb/stosb" with simple code sequence if the data size
> >       is a constant.
> >    b. Use loop if data size is not a constant.
> > 3. Use memcpy/memset libray function if data size is unknown or > 256.
> >
> > With -mtune=generic -O2,
>
> Is there any visible code-size effect of increasing CLEAR_RATIO on


Hongyue, please collect code size differences on SPEC CPU 2017 and
eembc.

> SPEC/eembc?  Did you play with other values of MOVE/CLEAR_RATIO?
> 17 memory-to-memory/memory-clear insns looks quite a lot.
>

Yes, we did.  256 bytes is the threshold above which memcpy/memset in libc
win. Below 256 bytes, 16 by_pieces move/store is faster.

-- 
H.J.

Re: [PATCH 3/3] x86: Update memcpy/memset inline strategies for -mtune=generic

Reply via email to