https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119596
--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> --- >Benching based on the Linux kernel and the Sapphire Rapids CPU: With -mtune=sapphirerapids , GCC produces: ``` _Z4zeroP3foo: .LFB0: .cfi_startproc mov QWORD PTR [rdi], 0 mov QWORD PTR [rdi+8], 0 mov QWORD PTR [rdi+16], 0 mov QWORD PTR [rdi+24], 0 mov QWORD PTR [rdi+32], 0 mov BYTE PTR [rdi+40], 0 ret ```` Which is what you want. Again I will mention this: Plus for generic tuning you need to benchmark one more than just one processor (at least a few Intel and AMD processors).