On Fri, 2024-09-27 at 00:26 -0400, Pietro Monteiro wrote:
> The prefetch instruction that is emitted by __builtin_prefetch is re-ordered 
> on GCC, but not on clang[0]. GCC's behavior is surprising because when using 
> the builtin you want the instruction to be placed at the exact point where 
> you put it. Moving it around, specially across load/stores, may end up being 
> a pessimization. Adding a blockage instruction before the prefetch prevents 
> the scheduler from moving it.
> 
> [0] https://godbolt.org/z/Ycjr7Tq8b
> 
> 

As pointed out by Andrew in the PR, this issue has been around for a while:

see
https://gcc.gnu.org/legacy-ml/gcc-patches/2009-05/msg01233.html
and
https://gcc.gnu.org/legacy-ml/gcc-patches/2009-09/msg00130.html

However, those fix the reordering only for sched2.  Hypothetically the
reordering could happen also before sched2 by some other RTL pass.  So
perhaps it's better to insert the barrier right when the builtin is emitted.

Best regards,
Oleg Endo






> -- 8< --
> 
> 
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index 37c7c98e5c..fec751e0d6 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -1329,7 +1329,12 @@ expand_builtin_prefetch (tree exp)
>        create_integer_operand (&ops[1], INTVAL (op1));
>        create_integer_operand (&ops[2], INTVAL (op2));
>        if (maybe_expand_insn (targetm.code_for_prefetch, 3, ops))
> -     return;
> +        {
> +          /* Prevent the prefetch from being moved.  */
> +          rtx_insn *last = get_last_insn ();
> +          emit_insn_before (gen_blockage (), last);
> +          return;
> +        }
>      }
>  
>    /* Don't do anything with direct references to volatile memory, but

Reply via email to