The prefetch instruction that is emitted by __builtin_prefetch is re-ordered on 
GCC, but not on clang[0]. GCC's behavior is surprising because when using the 
builtin you want the instruction to be placed at the exact point where you put 
it. Moving it around, specially across load/stores, may end up being a 
pessimization. Adding a blockage instruction before the prefetch prevents the 
scheduler from moving it.

[0] https://godbolt.org/z/Ycjr7Tq8b


-- 8< --


diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 37c7c98e5c..fec751e0d6 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -1329,7 +1329,12 @@ expand_builtin_prefetch (tree exp)
       create_integer_operand (&ops[1], INTVAL (op1));
       create_integer_operand (&ops[2], INTVAL (op2));
       if (maybe_expand_insn (targetm.code_for_prefetch, 3, ops))
-       return;
+        {
+          /* Prevent the prefetch from being moved.  */
+          rtx_insn *last = get_last_insn ();
+          emit_insn_before (gen_blockage (), last);
+          return;
+        }
     }
 
   /* Don't do anything with direct references to volatile memory, but

Reply via email to