> > > > Patch is OK now. I was wondering about using avx256 for moves of known > > Done. X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB is in now. Can > you take a look at the patch for Skylake: > > https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567096.html
I was wondering, if CPU preffers rep movsb when rcx is a compile time constant, it probably does some logic at the decode time (i.e. expands it into some sequence) and if so, then it may require the code setting the register to be near rep (via fusing or simlar mechanism) Perhaps we want to have fusing pattern for this, so we do not move them far apart? > > > size (per comment on MOVE_MAX_PIECES there is issue with > > MAX_FIXED_MODE_SIZE, but that seems not hard to fix). Did you look into > > it? > > It requires some changes in the middle-end. See yep, I know - tried that too for zen3 tuning :) > users/hjl/pieces/master branch: > > https://gitlab.com/x86-gcc/gcc/-/tree/users/hjl/pieces/master > > I am rebasing it. Thanks, it would also help to reduce the code size bloat by bumping up the move by pieces. Clang is using those. Honza > > -- > H.J.