https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119596

--- Comment #8 from Mateusz Guzik <mjguzik at gmail dot com> ---
(In reply to Andrew Pinski from comment #6)
> (In reply to Mateusz Guzik from comment #4)
> > The gcc default for the generic target is poor. rep is known to be a problem
> > on most uarchs.
>
> Is it though? Or is it only poor on Intel ones?
>
> With -mtune=intel, I don't get `rep movsq`
> Because with -mtune=znver2/3/4/5 I do.
>
> Again as I mentioned please benchmark on more than just one processors and
> such.

I verified clang also emits regular stores for zen.

I do agree tests on more CPUs are needed, the one I reported on is of
significance in that FSRM was supposed to sort out some of it.

I may not be in position to the same exact test on AMD cpus, unfortunately.

Is there a set of benches you guys have for these?

What I should be able to do is run an existing bench suite (if it manageable to
set up) or do something rather primitive like issuing the relevant
memset/memcpys in a loop and checking ops/s.

Reply via email to