Despite LA664 has 1-cycle movgr2cf in real, it seems setting the correct value in the cost model has puzzled the register allocator and severely impacted the performance, esp. for some workloads like OpenSSL 3.5.1 SHA512 and SPEC CPU 2017 exchange_r.
As movgr2cf is very rarely used (we cannot even construct a test case to make it used), just remove the LA664 customization for it as a temporary solution. gcc/ChangeLog: PR target/120476 * config/loongarch/loongarch-def.cc (loongarch_cpu_rtx_cost_data): Remove movgr2cf cost customization for LA664. --- Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? gcc/config/loongarch/loongarch-def.cc | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/gcc/config/loongarch/loongarch-def.cc b/gcc/config/loongarch/loongarch-def.cc index dcd8d905c5f..dfb12eb946d 100644 --- a/gcc/config/loongarch/loongarch-def.cc +++ b/gcc/config/loongarch/loongarch-def.cc @@ -147,8 +147,10 @@ array_tune<loongarch_rtx_cost_data> loongarch_cpu_rtx_cost_data = array_tune<loongarch_rtx_cost_data> () .set (TUNE_LA664, loongarch_rtx_cost_data () - .movcf2gr_ (COSTS_N_INSNS (1)) - .movgr2cf_ (COSTS_N_INSNS (1))); + .movcf2gr_ (COSTS_N_INSNS (1))); + +/* FIXME: LA664 has 1-cycle movgr2cf as well, but setting the real value + here would pessimize the performance for some reason. See PR120476. */ /* RTX costs to use when optimizing for size. We use a value slightly larger than COSTS_N_INSNS (1) for all of them -- 2.50.1