https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114729

Vineet Gupta <vineetg at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2024-04-15 00:00:00         |2024-4-16

--- Comment #9 from Vineet Gupta <vineetg at gcc dot gnu.org> ---
So I stared with the reg being spilled (a1)

.L2:
        beq     a1,zero,.L5    # if j[1] == 0
        li      a2,1
        ble     a6,s11,.L2    # if j[0] < 1
        sd      a1,8(sp)                # spill (save)


.L3:                       # inner loop start
       ...

        blt  a2,a6,.L3    # inner loop end

        ld      a1,8(sp)                # spill (restore)
        j       .L2

Next was zooming into the inner loop where a1 is being used/clobbered by sched1
and not w/o sched1 with my rudimentary define, use, dead annotation.

------------------------------------------------------------------------------
        -fschedule-insns (NOK)       |         -fno-schedule-insns (OK)
------------------------------------------------------------------------------
1-def      ld    a5,%lo(u)(s0) #u, u | 1-def       ld    a5,%lo(u)(t6)  # u, u
2-def      srliw a0,a5,16            | 2-def       srliw s10,a5,16
3-def      srli  a1,a5,32            | 1-use       sh    a5,%lo(_Z1sv)(a4)
1-use      sh    a5,%lo(_Z1sv)(a3)   | 2-dead      sh    s10,%lo(_Z1sv+2)(a4)
              ---insn1---            | 3-def       srli  s10,a5,32
1-use      srli  a5,a5,48            | 1-use       srli  a5,a5,48
              ---insn2---            | 1-dead      sh    a5,%lo(_Z1sv+6)(a4)
2-dead     sh    a0,%lo(_Z1sv+2)(a3) |              ---insn1---
3-dead     sh    a1,%lo(_Z1sv+4)(a3) |              ---insn2---
1-dead     sh    a5,%lo(_Z1sv+6)(a3) | 3-dead      sh    s10,%lo(_Z1sv+4)(a4)

The problem seems to be longer live range of 2-def (on left side). If it was
used/dead right afte, 3-def won't need a new register.

With that insight, I can now start looking into the sched1 dumps of the
corresponding BB.

;;       10--> b  0: i  35 r170#0=[r242+low(`u')]                 
:alu:@GR_REGS+1(1)@FP_REGS+0(0)
;;       11--> b  0: i  79 r209=[r229+low(`f')]                   
:alu:GR_REGS+0(0)FP_REGS+1(1)
;;       12--> b  0: i  76 r141=fix(r206)                         
:alu:@GR_REGS+1(1)@FP_REGS+0(-1)
;;       13--> b  0: i  46 r180=zxt(r170,0x10,0x10)               
:alu:@GR_REGS+1(1)@FP_REGS+0(0)
;;       14--> b  0: i  55 r188=r170 0>>0x20                      
:alu:GR_REGS+1(1)FP_REGS+0(0)
;;       15--> b  0: i  81 r210=r141<<0x3                         
:alu:GR_REGS+1(0)FP_REGS+0(0)
;;       16--> b  0: i  82 r211=r143+r210                         
:alu:GR_REGS+1(0)FP_REGS+0(0)
;;       17--> b  0: i  44 [r230+low(`_Z1sv')]=r170#0             
:alu:@GR_REGS+0(0)@FP_REGS+0(0)
;;       18--> b  0: i  65 r197=r170 0>>0x30                      
:alu:GR_REGS+1(0)FP_REGS+0(0)
;;       19--> b  0: i  54 [r230+low(const(`_Z1sv'+0x2))]=r180#0  
:alu:@GR_REGS+0(-1)@FP_REGS+0(0)
;;       20--> b  0: i  64 [r230+low(const(`_Z1sv'+0x4))]=r188#0  
:alu:GR_REGS+0(-1)FP_REGS+0(0)
;;       21--> b  0: i  73 [r230+low(const(`_Z1sv'+0x6))]=r197#0  
:alu:GR_REGS+0(-1)FP_REGS+0(0)

Reply via email to