https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88560
--- Comment #5 from Vladimir Makarov <vmakarov at gcc dot gnu.org> --- We have too many tests checking expected generated code. We should more focus on overall effect of the change. SPEC would be a good criterium although it is hard to check SPEC for each patch. I've checked the generated code of arm8_2-fp16-move-1.c and found that in most cases GCC generates better code with the patch: @@ -80,7 +80,6 @@ test_load_store_1: @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. lsl r1, r1, #1 - vmov.f16 s0, r3 @ __fp16 ldrh r3, [r2, r1] @ __fp16 strh r3, [r0, r1] @ __fp16 bx lr @@ -97,10 +96,11 @@ test_load_store_2: @ link register save eliminated. add r1, r1, #2 lsl r1, r1, #1 + add r3, r2, r1 add r0, r0, r1 - ldrh r3, [r2, r1] @ __fp16 + vld1.16 {d0[0]}, [r3] + vmov.f16 r3, s0 @ __fp16 strh r3, [r0, #-4] @ __fp16 - vmov.f16 s0, r3 @ __fp16 bx lr .size test_load_store_2, .-test_load_store_2 .align 2 @@ -176,14 +176,13 @@ test_select_5: @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s0, s0 - vcvtb.f32.f16 s14, s1 - vmov s15, s1 @ __fp16 - vcmpe.f32 s0, s14 + vcvtb.f32.f16 s15, s1 + vcmpe.f32 s0, s15 vmrs APSR_nzcv, FPSCR bmi .L17 - vmov s15, s2 @ __fp16 + vmov s1, s2 @ __fp16 .L17: - vmov s0, s15 @ __fp16 + vmov s0, s1 @ __fp16 bx lr .size test_select_5, .-test_select_5 .align 2 @@ -197,14 +196,13 @@ test_select_6: @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s0, s0 - vcvtb.f32.f16 s14, s1 - vmov s15, s1 @ __fp16 - vcmpe.f32 s0, s14 + vcvtb.f32.f16 s15, s1 + vcmpe.f32 s0, s15 vmrs APSR_nzcv, FPSCR bls .L19 - vmov s15, s2 @ __fp16 + vmov s1, s2 @ __fp16 .L19: - vmov s0, s15 @ __fp16 + vmov s0, s1 @ __fp16 bx lr .size test_select_6, .-test The only worse code is for test_load_store_2. The culprit insn is (insn 12 10 16 2 (set (mem:HF (plus:SI (reg/f:SI 121) (const_int -4 [0xfffffffffffffffc])) [1 *_6+0 S2 A16]) (reg:HF 116 [ <retval> ])) "b.i":4:8 653 {*movhf_vfp_fp16} (expr_list:REG_DEAD (reg/f:SI 121) (nil))) Before the patch p116 had GENERAL_REGS class and after the patch it has VFP_LO_REGS. The memory move cost for VFP_LO_REGS is small although the insn for VFP_LO_REGS prohibits memory with this kind address. There are a few ways to fix this: 1. give RA correct cost for store VFP_LO_REGS into given memory which means adding a new hook taking specific memory rtx 2. check insn constraints to modify cost in some way 3. try to reload address instead of memory itself I prefer the last solution although it requires changes in sensitive part of LRA and will require a lot testing on a few targets. So I am working on the patch. In any case, I don't think the PR should have P1. P3 or even P4 would be more accurate estimation (please see my top comments).