> >
> > Patch is OK now.  I was wondering about using avx256 for moves of known
> 
> Done.   X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB is in now.   Can
> you take a look at the patch for Skylake:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567096.html

I was wondering, if CPU preffers rep movsb when rcx is a compile time
constant, it probably does some logic at the decode time (i.e. expands
it into some sequence) and if so, then it may require the code setting
the register to be near rep (via fusing or simlar mechanism)

Perhaps we want to have fusing pattern for this, so we do not move them
far apart?
> 
> > size (per comment on MOVE_MAX_PIECES there is issue with
> > MAX_FIXED_MODE_SIZE, but that seems not hard to fix). Did you look into
> > it?
> 
> It requires some changes in the middle-end.   See

yep, I know - tried that too for zen3 tuning :)
> users/hjl/pieces/master branch:
> 
> https://gitlab.com/x86-gcc/gcc/-/tree/users/hjl/pieces/master
> 
> I am rebasing it.

Thanks, it would also help to reduce the code size bloat by bumping up
the move by pieces. Clang is using those.

Honza
> 
> -- 
> H.J.

Reply via email to