On Mon, Mar 22, 2021 at 6:29 AM Richard Biener <richard.guent...@gmail.com> wrote: > > On Mon, Mar 22, 2021 at 2:19 PM H.J. Lu via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > Simply memcpy and memset inline strategies to avoid branches for > > -mtune=generic: > > > > 1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector > > load and store for up to 16 * 16 (256) bytes when the data size is > > fixed and known. > > 2. Inline only if data size is known to be <= 256. > > a. Use "rep movsb/stosb" with simple code sequence if the data size > > is a constant. > > b. Use loop if data size is not a constant. > > 3. Use memcpy/memset libray function if data size is unknown or > 256. > > > > With -mtune=generic -O2, > > Is there any visible code-size effect of increasing CLEAR_RATIO on
Hongyue, please collect code size differences on SPEC CPU 2017 and eembc. > SPEC/eembc? Did you play with other values of MOVE/CLEAR_RATIO? > 17 memory-to-memory/memory-clear insns looks quite a lot. > Yes, we did. 256 bytes is the threshold above which memcpy/memset in libc win. Below 256 bytes, 16 by_pieces move/store is faster. -- H.J.