I noticed that poor code is emitted for a long double 0.0. This testcase long double sub (void) { return 0.0; } void sub2 (long double *ld) { *ld = 0.0; } currently generates sub: ldr q0, .LC0 ret ... sub2: ldr q0, .LC1 str q0, [x0] ret where LC0 and LC1 are 16-byte constant table long double zeros. With the attached patch, I get sub: movi v0.2d, #0 ret ... sub2: stp xzr, xzr, [x0] ret
The main problem is in aarch64_valid_floating_const, which rejects all constants for TFmode. There is a comment that says we should handle 0, but not until after the movtf pattern is improved. This improvement apparently happened two years ago with this patch 2013-05-09 Sofiane Naci <sofiane.n...@arm.com> * config/aarch64/aarch64.md: New movtf split. ... so this comment is no longer relevant, and we should handle 0 now. The patch deletes the out of date comment and moves the 0 check before the TFmode check so that TFmode 0 is accepted. There are a few other changes needed to make this work well. The movtf expander needs to avoid forcing 0 to a reg for a mem dest, just like the movti pattern already does. The Ump/?rY alternative needs to be split into two, as %H doesn't work for const_double 0, again this is like the movti pattern. The condition needs to allow 0 values in operand 1, as is done in the movti pattern. I noticed another related problem while making this change. The ldp/stp instructions in the movtf_aarch64 pattern have neon attribute types. However, these are integer instructions with matching 'r' constraints and hence should be using load2/store2 attribute types, just like in the movti pattern. This was tested with a default languages make bootstrap and make check on an APM system. There are some similar problems with the movsf and movdf patterns. I plan to submit a patch to fix them after this one is accepted. Jim
2015-06-03 Jim Wilson <jim.wil...@linaro.org> * config/aarch64/aarch64.c (aarch64_valid_floating_const): Move aarch64_float_const_zero_rtx_p check before TFmode check. * config/aarch64/aarch64.md (movtf): Don't call force_reg if op1 is an fp zero. (movtf_aarch64): Separate ?rY alternative into two. Adjust assembly code and attributes to match. Change condition from register_operand to aarch64_reg_or_fp_zero for op1. Change type for ldp from neon_load1_2reg to load2. Change type for stp from neon_store1_2reg to store2. Index: config/aarch64/aarch64.c =================================================================== --- config/aarch64/aarch64.c (revision 224054) +++ config/aarch64/aarch64.c (working copy) @@ -7430,16 +7430,13 @@ aarch64_valid_floating_const (machine_mo if (!CONST_DOUBLE_P (x)) return false; - /* TODO: We could handle moving 0.0 to a TFmode register, - but first we would like to refactor the movtf_aarch64 - to be more amicable to split moves properly and - correctly gate on TARGET_SIMD. For now - reject all - constants which are not to SFmode or DFmode registers. */ + if (aarch64_float_const_zero_rtx_p (x)) + return true; + + /* We only handle moving 0.0 to a TFmode register. */ if (!(mode == SFmode || mode == DFmode)) return false; - if (aarch64_float_const_zero_rtx_p (x)) - return true; return aarch64_float_const_representable_p (x); } Index: config/aarch64/aarch64.md =================================================================== --- config/aarch64/aarch64.md (revision 224053) +++ config/aarch64/aarch64.md (working copy) @@ -1040,18 +1040,20 @@ (define_expand "movtf" FAIL; } - if (GET_CODE (operands[0]) == MEM) + if (GET_CODE (operands[0]) == MEM + && ! (GET_CODE (operands[1]) == CONST_DOUBLE + && aarch64_float_const_zero_rtx_p (operands[1]))) operands[1] = force_reg (TFmode, operands[1]); " ) (define_insn "*movtf_aarch64" [(set (match_operand:TF 0 - "nonimmediate_operand" "=w,?&r,w ,?r,w,?w,w,m,?r ,Ump") + "nonimmediate_operand" "=w,?&r,w ,?r,w,?w,w,m,?r ,Ump,Ump") (match_operand:TF 1 - "general_operand" " w,?r, ?r,w ,Y,Y ,m,w,Ump,?rY"))] + "general_operand" " w,?r, ?r,w ,Y,Y ,m,w,Ump,?r ,Y"))] "TARGET_FLOAT && (register_operand (operands[0], TFmode) - || register_operand (operands[1], TFmode))" + || aarch64_reg_or_fp_zero (operands[1], TFmode))" "@ orr\\t%0.16b, %1.16b, %1.16b # @@ -1062,12 +1064,13 @@ (define_insn "*movtf_aarch64" ldr\\t%q0, %1 str\\t%q1, %0 ldp\\t%0, %H0, %1 - stp\\t%1, %H1, %0" + stp\\t%1, %H1, %0 + stp\\txzr, xzr, %0" [(set_attr "type" "logic_reg,multiple,f_mcr,f_mrc,fconstd,fconstd,\ - f_loadd,f_stored,neon_load1_2reg,neon_store1_2reg") - (set_attr "length" "4,8,8,8,4,4,4,4,4,4") - (set_attr "fp" "*,*,yes,yes,*,yes,yes,yes,*,*") - (set_attr "simd" "yes,*,*,*,yes,*,*,*,*,*")] + f_loadd,f_stored,load2,store2,store2") + (set_attr "length" "4,8,8,8,4,4,4,4,4,4,4") + (set_attr "fp" "*,*,yes,yes,*,yes,yes,yes,*,*,*") + (set_attr "simd" "yes,*,*,*,yes,*,*,*,*,*,*")] ) (define_split