On 10/01/17 17:18, Wilco Dijkstra wrote:
> My previous change to the Cortex-A53 scheduler resulted in a 13% regression 
> on a
> proprietary benchmark.  This turned out to be due to non-optimal scheduling 
> of int
> to float conversions.  This patch separates int to FP transfers from int to 
> float
> conversions based on experiments to determine the best schedule.  As a result 
> of
> these tweaks the performance of the benchmark improves by 20%.
> 
> ChangeLog:
> 2017-01-10  Wilco Dijkstra  <wdijk...@arm.com>
> 
>       * config/arm/cortex-a53.md: Add bypasses for
>       cortex_a53_r2f_cvt.
>       (cortex_a53_r2f): Only use for transfers.
>       (cortex_a53_f2r): Likewise.
>       (cortex_a53_r2f_cvt): Add reservation for conversions.
>       (cortex_a53_f2r_cvt): Likewise.
> 

OK.

R.

> --
> 
> diff --git a/gcc/config/arm/cortex-a53.md b/gcc/config/arm/cortex-a53.md
> index 
> 14822ba0ac0532aaf0dd29cff7a87e32e745cbe8..b367ad403a4a641da34521c17669027b87092737
>  100644
> --- a/gcc/config/arm/cortex-a53.md
> +++ b/gcc/config/arm/cortex-a53.md
> @@ -252,9 +252,18 @@
>                "cortex_a53_r2f")
>  
>  (define_bypass 1 "cortex_a53_mul,
> -               cortex_a53_load*"
> +               cortex_a53_load1,
> +               cortex_a53_load2"
>                "cortex_a53_r2f")
>  
> +(define_bypass 2 "cortex_a53_alu*"
> +              "cortex_a53_r2f_cvt")
> +
> +(define_bypass 3 "cortex_a53_mul,
> +               cortex_a53_load1,
> +               cortex_a53_load2"
> +              "cortex_a53_r2f_cvt")
> +
>  ;; Model flag forwarding to branches.
>  
>  (define_bypass 0 "cortex_a53_alu*,cortex_a53_shift*"
> @@ -514,16 +523,24 @@
>  ;; Floating-point to/from core transfers.
>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>  
> -(define_insn_reservation "cortex_a53_r2f" 6
> +(define_insn_reservation "cortex_a53_r2f" 2
>    (and (eq_attr "tune" "cortexa53")
> -       (eq_attr "type" "f_mcr,f_mcrr,f_cvti2f,
> -                     neon_from_gp, neon_from_gp_q"))
> -  "cortex_a53_slot_any,nothing*2,cortex_a53_fp_alu")
> +       (eq_attr "type" "f_mcr,f_mcrr"))
> +  "cortex_a53_slot_any,cortex_a53_fp_alu")
> +
> +(define_insn_reservation "cortex_a53_f2r" 4
> +  (and (eq_attr "tune" "cortexa53")
> +       (eq_attr "type" "f_mrc,f_mrrc"))
> +  "cortex_a53_slot_any,cortex_a53_fp_alu")
> +
> +(define_insn_reservation "cortex_a53_r2f_cvt" 4
> +  (and (eq_attr "tune" "cortexa53")
> +       (eq_attr "type" "f_cvti2f, neon_from_gp, neon_from_gp_q"))
> +  "cortex_a53_slot_any,cortex_a53_fp_alu")
>  
> -(define_insn_reservation "cortex_a53_f2r" 6
> +(define_insn_reservation "cortex_a53_f2r_cvt" 5
>    (and (eq_attr "tune" "cortexa53")
> -       (eq_attr "type" "f_mrc,f_mrrc,f_cvtf2i,
> -                     neon_to_gp, neon_to_gp_q"))
> +       (eq_attr "type" "f_cvtf2i, neon_to_gp, neon_to_gp_q"))
>    "cortex_a53_slot_any,cortex_a53_fp_alu")
>  
>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> 

Reply via email to