在 2023/3/6 下午4:02, Xi Ruoyao 写道:
On Mon, 2023-03-06 at 13:59 +0800, Xi Ruoyao via Gcc-patches wrote:
On Mon, 2023-03-06 at 11:16 +0800, Xi Ruoyao wrote:

/* snip */

Sorry for the late reply, the first patch I think is fine. But I haven't
reproduced the problem of the second mail.

Is there any special option in the configuration?
Oh some strange thing might be happening... I'll try to figure out what
has caused the behavior difference.
Oh no, the difference is caused by --enable-default-pie.

Maybe I should just add -fno-PIE for the dg-options.  But now I'm still
puzzled: why would -fPIE affect code generation on LoongArch?  AFAIK all
the code we are generating is position independent (at least for now).
Without -fPIE, the compiler stores a register with no reason:
Pushed the first patch as r13-6501.  The second one is dropped and I've
created PR109035 for the "unnecessary store" issue (after some failed
attempts to triage it).

Has the first patch been merged into the main branch yet?

I think there is one more test case that needs to be modified:

--- a/gcc/testsuite/gcc.target/loongarch/prolog-opt.c
+++ b/gcc/testsuite/gcc.target/loongarch/prolog-opt.c
@@ -1,7 +1,7 @@
 /* Test that LoongArch backend stack drop operation optimized. */

 /* { dg-do compile } */
-/* { dg-options "-O2 -mabi=lp64d" } */
+/* { dg-options "-O2 -mabi=lp64d -fno-stack-protector" } */
 /* { dg-final { scan-assembler "addi.d\t\\\$r3,\\\$r3,-16" } } */


$ cat t.c
int test(int x)
{
         char buf[128 << 10];
         return buf[x];
}
$ ./gcc/cc1 t.c -nostdinc  -O2 -fdump-rtl-all -o- 2>/dev/null | grep test: -A20
test:
.LFB0 = .
         lu12i.w $r13,-135168>>12                        # 0xfffffffffffdf000
         ori     $r13,$r13,4080
         add.d   $r3,$r3,$r13
.LCFI0 = .
         lu12i.w $r12,-131072>>12                        # 0xfffffffffffe0000
         lu12i.w $r13,131072>>12                 # 0x20000
         add.d   $r13,$r13,$r12
         addi.d  $r12,$r3,16
         add.d   $r12,$r13,$r12
         lu12i.w $r13,131072>>12                 # 0x20000
         st.d    $r12,$r3,8
         ori     $r13,$r13,16
         ldx.b   $r4,$r12,$r4
         add.d   $r3,$r3,$r13
.LCFI1 = .
         jr      $r1
.LFE0:
         .size   test, .-test
         .section        .eh_frame,"aw",@progbits

Note the "st.d  $r12,$r3,8" instruction is completely meaningless.

The t.c.300r.ira dump contains some "interesting" thing:

Pass 0 for finding pseudo/allocno costs

     a0 (r87,l0) best GR_REGS, allocno GR_REGS
     a1 (r84,l0) best NO_REGS, allocno NO_REGS
     a2 (r83,l0) best GR_REGS, allocno GR_REGS

   a0(r87,l0) costs: SIBCALL_REGS:2000,2000 JIRL_REGS:2000,2000 
CSR_REGS:2000,2000 GR_REGS:2000,2000 FP_REGS:8000,8000 ALL_REGS:32000,32000 
MEM:8000,8000
   a1(r84,l0) costs: SIBCALL_REGS:1000000,1000000 JIRL_REGS:1000000,1000000 
CSR_REGS:1000000,1000000 GR_REGS:1000000,1000000 FP_REGS:1004000,1004000 
ALL_REGS:1016000,1016000 MEM:1004000,1004000
   a2(r83,l0) costs: SIBCALL_REGS:1000000,1000000 JIRL_REGS:1000000,1000000 
CSR_REGS:1000000,1000000 GR_REGS:1000000,1000000 FP_REGS:1004000,1004000 
ALL_REGS:1008000,1008000 MEM:1004000,1004000


Here r84 is the pseudo register for ($frame - 131072).  Any idea why the
compiler selects "NO_REGS" here?

FWIW RISC-V port suffers the same issue:
https://godbolt.org/z/aPorqj73b.


Reply via email to