On Fri, Jun 13, 2025 at 3:15 PM Cui, Lili <lili....@intel.com> wrote:

> > > On Mon, Apr 21, 2025 at 7:24 AM H.J. Lu <hjl.to...@gmail.com> wrote:
> > > >
> > > > On Sun, Apr 20, 2025 at 6:31 PM Jan Hubicka <hubi...@ucw.cz> wrote:
> > > > >
> > > > > >       PR target/102294
> > > > > >       PR target/119596
> > > > > >       * config/i386/x86-tune-costs.h (generic_memcpy): Updated.
> > > > > >       (generic_memset): Likewise.
> > > > > >       (generic_cost): Change CLEAR_RATIO to 17.
> > > > > >       * config/i386/x86-tune.def
> > (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB):
> > > > > >       Add m_GENERIC.
> > > > >
> > > > > Looking through the PRs, there they are primarily about
> > > > > CLEAR_RATIO being lower than on clang which makes us to produce
> > > > > slower (but smaller) initialization sequence for blocks of certain 
> > > > > size.
> > > > > It seems Kenrel is discussed there too (-mno-sse).
> > > > >
> > > > > Bumping it up for SSE makes sense provided that SSE codegen does
> > > > > not suffer from the long $0 immediates. I would say it is OK also
> > > > > for -mno-sse provided speedups are quite noticeable, but it would
> > > > > be really nice to solve this incrementally.
> > > > >
> > > > > concerning X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB my
> > understanding
> > > > > is that Intel chips likes stosb for small blocks, since they are
> > > > > not optimized for stosw/q.  Zen seems to preffer stopsq over stosb
> > > > > for blocks up to 128 bytes.
> > > > >
> > > > > How does the loop version compare to stopsb for blocks in rage
> > > > > 1...128 bytes in Intel hardware?
> > > > >
> > > > > Since the case we prove block size to be small but we do not know
> > > > > a size, I think using loop or unrolled for blocks up to say 128
> > > > > bytes may work well for both.

Perhaps someone is interested in the following thread from LKML:

"[PATCH v2] x86: prevent gcc from emitting rep movsq/stosq for inlined ops"

https://lore.kernel.org/lkml/20250605164733.737543-1-mjgu...@gmail.com/

There are several PRs regarding memcpy/memset linked from the above message.

Please also note a message from Linus from the above thread:

https://lore.kernel.org/lkml/CAHk-=wg1qqlwkpyvxxznxwbot48--lkjucjjf8phdhrxv0u...@mail.gmail.com/

Uros,

Reply via email to