在 2023/12/12 下午7:26, Xi Ruoyao 写道:
On Tue, 2023-12-12 at 19:14 +0800, Jiahao Xu wrote:
Define LOGICAL_OP_NON_SHORT_CIRCUIT as 0, for a short-circuit branch, use the
short-circuit operation instead of the non-short-circuit operation.

This gives a 1.8% improvement in SPECCPU 2017 fprate on 3A6000.
In r14-15 we removed LOGICAL_OP_NON_SHORT_CIRCUIT definition because the
default value (1 for all current LoongArch CPUs with branch_cost = 6)
may reduce the number of conditional branch instructions.

I guess here the problem is floating-point compare instruction is much
more costly than other instructions but the fact is not correctly
modeled yet.  Could you try
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640012.html
where I've raised fp_add cost (which is used for estimating floating-
point compare cost) to 5 instructions and see if it solves your problem
without LOGICAL_OP_NON_SHORT_CIRCUIT?
I think this is not the same issue as the cost of floating-point comparison instructions. The definition of LOGICAL_OP_NON_SHORT_CIRCUIT affects how the short-circuit branch, such as (A AND-IF B), is executed, and it is not directly related to the cost of floating-point comparison instructions. I will try to test it using SPECCPU 2017.
If not I guess you can try increasing the floating-point comparison cost
more in loongarch_rtx_costs:

     case UNLT:
       /* Branch comparisons have VOIDmode, so use the first operand's
          mode instead.  */
       mode = GET_MODE (XEXP (x, 0));
       if (FLOAT_MODE_P (mode))
         {
           *total = loongarch_cost->fp_add;


Try to make it fp_add + something?

           return false;
         }
       *total = loongarch_binary_cost (x, COSTS_N_INSNS (1), COSTS_N_INSNS (4),
                                       speed);
       return true;


If adjusting the cost model does not work I'd say this is a middle-end
issue and we should submit a bug report.

gcc/ChangeLog:

        * config/loongarch/loongarch.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Define.

gcc/testsuite/ChangeLog:

        * gcc.target/loongarch/short-circuit.c: New test.

diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index f1350b6048f..880c576c35b 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -869,6 +869,7 @@ typedef struct {
     1 is the default; other values are interpreted relative to that.  */
 #define BRANCH_COST(speed_p, predictable_p) loongarch_branch_cost
+#define LOGICAL_OP_NON_SHORT_CIRCUIT 0
 /* Return the asm template for a conditional branch instruction.
     OPCODE is the opcode's mnemonic and OPERANDS is the asm template for
diff --git a/gcc/testsuite/gcc.target/loongarch/short-circuit.c 
b/gcc/testsuite/gcc.target/loongarch/short-circuit.c
new file mode 100644
index 00000000000..bed585ee172
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/short-circuit.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-gimple" } */
+
+int
+short_circuit (float *a)
+{
+  float t1x = a[0];
+  float t2x = a[1];
+  float t1y = a[2];
+  float t2y = a[3];
+  float t1z = a[4];
+  float t2z = a[5];
+
+  if (t1x > t2y  || t2x < t1y  || t1x > t2z || t2x < t1z || t1y > t2z || t2y < 
t1z)
+    return 0;
+
+  return 1;
+}
+/* { dg-final { scan-tree-dump-times "if" 6 "gimple" } } */

Reply via email to