On Mon, Jul 29, 2019 at 01:32:46PM -0700, Nathan Chancellor wrote: > For the record: > > https://godbolt.org/z/z57VU7 > > This seems consistent with what Michael found so I don't think a revert > is entirely unreasonable.
Try this: https://godbolt.org/z/6_ZfVi This matters in non-trivial loops, for example. But all current cases where such non-trivial loops are done with cache block instructions are actually written in real assembler already, using two registers. Because performance matters. Not that I recommend writing code as critical as memset in C with inline asm :-) Segher