https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116713

--- Comment #4 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to pietro from comment #3)
> It looks like it's a more general GCC issue. The prefetch gets moved on both
> x86_64 and aarch64 on GCC, but not on clang: https://godbolt.org/z/Ycjr7Tq8b
> 
> > It looks like the problem can be "fixed" by inserting a 
> > '__atomic_thread_fence (1);' before the '__builtin_prefetch', which kinda 
> > makes sense.
> 
> The thread fence doesn't fix the prefetch move on x86_64, but the empty
> "asm" trick does: https://godbolt.org/z/5G8qe4o1n

I think you have a valid point there.  If somebody cares enough to insert a
'builtin_prefetch', it can be assumed that the particular placement is
intentional.  It's quite unexpected, and even possibly counter productive, for
the compiler to reorder it.  

So perhaps it's better if it always implicitly acts as a barrier.

Reply via email to