> > >
> > > Patch is OK now.  I was wondering about using avx256 for moves of known
> > 
> > Done.   X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB is in now.   Can
> > you take a look at the patch for Skylake:
> > 
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567096.html
> 
> I was wondering, if CPU preffers rep movsb when rcx is a compile time
> constant, it probably does some logic at the decode time (i.e. expands
> it into some sequence) and if so, then it may require the code setting
> the register to be near rep (via fusing or simlar mechanism)
> 
> Perhaps we want to have fusing pattern for this, so we do not move them
> far apart?

Reading through the optimization manual it seems that mosvb is fast for
small block no matter if the size is hard wired. In that case you
probably want to check whetehr max_size or expected_size is known to be
small rather than max_size == min_size and both being small.

But it depends on what CPU really does.
Honza

Reply via email to