https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119596
--- Comment #8 from Mateusz Guzik <mjguzik at gmail dot com> --- (In reply to Andrew Pinski from comment #6) > (In reply to Mateusz Guzik from comment #4) > > The gcc default for the generic target is poor. rep is known to be a problem > > on most uarchs. > > Is it though? Or is it only poor on Intel ones? > > With -mtune=intel, I don't get `rep movsq` > Because with -mtune=znver2/3/4/5 I do. > > Again as I mentioned please benchmark on more than just one processors and > such. I verified clang also emits regular stores for zen. I do agree tests on more CPUs are needed, the one I reported on is of significance in that FSRM was supposed to sort out some of it. I may not be in position to the same exact test on AMD cpus, unfortunately. Is there a set of benches you guys have for these? What I should be able to do is run an existing bench suite (if it manageable to set up) or do something rather primitive like issuing the relevant memset/memcpys in a loop and checking ops/s.