Ok, after spending the day on going down the rabbit hole of trying to optimize just about every, here are my patches.
Note, I simplified the constraints to eliminate some rare possibilities, like optimizing converting from double to long double if the double happened to be in a GPR register and the long double value was stored in memory (but there never was an optimization for having the double value be in a GPR and the long double value to also be a GPR). I also separated the VSX case from the non-VSX case. This is to simplify things at the RTL level (non-VSX must load up 0.0 to put into the lower word, while VSX can use the XXLXOR instruction to clear the register). I dropped support in the insns for extending the DFmode value to TFmode that is located in memory directly. Now, the compiler builds the whole value in FPRs and then does the store. This simplifies the code somewhat, and SPE/ieeequad paths require the value to be in registers, which might lead to other lra bugs. In the case of just doing one conversion in straight line code, it just changes the register allocation somewhat (allocate 1 TFmode pseudo instead of 1-2 DFmode psuedos). I have bootstrapped the compiler on little endian power8 with no regressions. I have built the test case with various options (-mlra vs. no -mlra, 32-bit, 64-bit, power5/power6/power7/power8), and it all builds correctly. Is this patch ok to apply to the trunk? I would like to apply this patch to GCC 5.x as well. However, in doing the patch, this patch touches areas that I've been working on for IEEE 128-bit floating point support, and so the patch will need to be reworked for GCC 5.x. Is it ok to install in the trunk? In addition, I will need to modify this area again with the next IEEE 128-bit floating point support patch, but I wanted to separate this patch out so that it could be considered by itself, and back ported to GCC 5.x. [gcc] 2015-10-05 Michael Meissner <meiss...@linux.vnet.ibm.com> Peter Bergner <berg...@vnet.ibm.com> PR target/67808 * config/rs6000/rs6000.md (extenddftf2): In the expander, only allow registers, but provide insns for the combiner to create for loads from memory. Separate VSX code from non-VSX code. For non-VSX code, combine extenddftf2_fprs into extenddftf2 and rename externaldftf2_internal to externaldftf2_fprs. Reorder constraints so that registers come before memory operations. Drop support from converting DFmode to TFmode, if the DFmode value is in a GPR register, and the TFmode is in memory. (extenddftf2_fprs): Likewise. (extenddftf2_internal): Likewise. (extenddftf2_vsx): Likewise. (extendsftf2): In the expander, only allow registers, but provide insns for the combiner to create for stores and loads. [gcc/testsuite] 2015-10-05 Michael Meissner <meiss...@linux.vnet.ibm.com> Peter Bergner <berg...@vnet.ibm.com> PR target/67808 * gcc.target/powerpc/pr67808.c: New test. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (revision 228496) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -6471,8 +6471,8 @@ (define_insn_and_split "*mov<mode>_softf [(set_attr "length" "20,20,16")]) (define_expand "extenddftf2" - [(set (match_operand:TF 0 "nonimmediate_operand" "") - (float_extend:TF (match_operand:DF 1 "input_operand" "")))] + [(set (match_operand:TF 0 "gpc_reg_operand" "") + (float_extend:TF (match_operand:DF 1 "gpc_reg_operand" "")))] "TARGET_HARD_FLOAT && (TARGET_FPRS || TARGET_E500_DOUBLE) && TARGET_LONG_DOUBLE_128" { @@ -6480,52 +6480,55 @@ (define_expand "extenddftf2" rs6000_expand_float128_convert (operands[0], operands[1], false); else if (TARGET_E500_DOUBLE) emit_insn (gen_spe_extenddftf2 (operands[0], operands[1])); + else if (TARGET_VSX) + emit_insn (gen_extenddftf2_vsx (operands[0], operands[1])); else - emit_insn (gen_extenddftf2_fprs (operands[0], operands[1])); + { + rtx zero = gen_reg_rtx (DFmode); + rs6000_emit_move (zero, CONST0_RTX (DFmode), DFmode); + emit_insn (gen_extenddftf2_fprs (operands[0], operands[1], zero)); + } DONE; }) -(define_expand "extenddftf2_fprs" - [(parallel [(set (match_operand:TF 0 "nonimmediate_operand" "") - (float_extend:TF (match_operand:DF 1 "input_operand" ""))) - (use (match_dup 2))])] - "!TARGET_IEEEQUAD - && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT - && TARGET_LONG_DOUBLE_128" +;; Allow memory operands for the source to be created by the combiner. +(define_insn_and_split "extenddftf2_fprs" + [(set (match_operand:TF 0 "gpc_reg_operand" "=d,d,&d") + (float_extend:TF (match_operand:DF 1 "nonimmediate_operand" "d,m,d"))) + (use (match_operand:DF 2 "nonimmediate_operand" "m,m,d"))] + "!TARGET_VSX && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT + && TARGET_LONG_DOUBLE_128 && !TARGET_IEEEQUAD" + "#" + "&& reload_completed" + [(set (match_dup 3) (match_dup 1)) + (set (match_dup 4) (match_dup 2))] { - /* VSX can create 0.0 directly, otherwise let rs6000_emit_move create - the proper constant. */ - if (TARGET_VSX) - operands[2] = CONST0_RTX (DFmode); - else - { - operands[2] = gen_reg_rtx (DFmode); - rs6000_emit_move (operands[2], CONST0_RTX (DFmode), DFmode); - } + const int lo_word = LONG_DOUBLE_LARGE_FIRST ? GET_MODE_SIZE (DFmode) : 0; + const int hi_word = LONG_DOUBLE_LARGE_FIRST ? 0 : GET_MODE_SIZE (DFmode); + + operands[3] = simplify_gen_subreg (DFmode, operands[0], TFmode, hi_word); + operands[4] = simplify_gen_subreg (DFmode, operands[0], TFmode, lo_word); }) -(define_insn_and_split "*extenddftf2_internal" - [(set (match_operand:TF 0 "nonimmediate_operand" "=m,Y,ws,d,&d") - (float_extend:TF (match_operand:DF 1 "input_operand" "d,r,md,md,md"))) - (use (match_operand:DF 2 "zero_reg_mem_operand" "d,r,j,m,d"))] - "!TARGET_IEEEQUAD - && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT - && TARGET_LONG_DOUBLE_128" +(define_insn_and_split "extenddftf2_vsx" + [(set (match_operand:TF 0 "gpc_reg_operand" "=d,d") + (float_extend:TF (match_operand:DF 1 "nonimmediate_operand" "ws,m")))] + "TARGET_LONG_DOUBLE_128 && TARGET_VSX && !TARGET_IEEEQUAD" "#" "&& reload_completed" - [(pc)] + [(set (match_dup 2) (match_dup 1)) + (set (match_dup 3) (match_dup 4))] { const int lo_word = LONG_DOUBLE_LARGE_FIRST ? GET_MODE_SIZE (DFmode) : 0; const int hi_word = LONG_DOUBLE_LARGE_FIRST ? 0 : GET_MODE_SIZE (DFmode); - emit_move_insn (simplify_gen_subreg (DFmode, operands[0], TFmode, hi_word), - operands[1]); - emit_move_insn (simplify_gen_subreg (DFmode, operands[0], TFmode, lo_word), - operands[2]); - DONE; + + operands[2] = simplify_gen_subreg (DFmode, operands[0], TFmode, hi_word); + operands[3] = simplify_gen_subreg (DFmode, operands[0], TFmode, lo_word); + operands[4] = CONST0_RTX (DFmode); }) (define_expand "extendsftf2" - [(set (match_operand:TF 0 "nonimmediate_operand" "") + [(set (match_operand:TF 0 "gpc_reg_operand" "") (float_extend:TF (match_operand:SF 1 "gpc_reg_operand" "")))] "TARGET_HARD_FLOAT && (TARGET_FPRS || TARGET_E500_DOUBLE) Index: gcc/testsuite/gcc.target/powerpc/pr67808.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr67808.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr67808.c (revision 0) @@ -0,0 +1,46 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ +/* { dg-options "-O1 -mvsx -mlra -mcpu=power7" } */ + +/* PR 67808: LRA ICEs on simple double to long double conversion test case */ + +void +dfoo (long double *ldb1, double *db1) +{ + *ldb1 = *db1; +} + +long double +dfoo2 (double *db1) +{ + return *db1; +} + +long double +dfoo3 (double x) +{ + return x; +} + +void +ffoo (long double *ldb1, float *db1) +{ + *ldb1 = *db1; +} + +long double +ffoo2 (float *db1) +{ + return *db1; +} + +long double +ffoo3 (float x) +{ + return x; +} + +/* { dg-final { scan-assembler "xxlxor" } } */