https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89154
Bug ID: 89154 Summary: 5% degradation of CPU2006 473.astar starting with r266305 Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pthaugen at gcc dot gnu.org CC: dje at gcc dot gnu.org, rguenth at gcc dot gnu.org, segher at gcc dot gnu.org, wschmidt at gcc dot gnu.org Target Milestone: --- Host: powerpc64le-unknown-linux-gnu Target: powerpc64le-unknown-linux-gnu Build: powerpc64le-unknown-linux-gnu Not sure if this is really tree-optimization issue, just picked as initial component since fix dealt with that. Could possibly be rtl-optimization/shrink-wrap issue brought about by additional register pressure due to CSE'ing/hoisting some additional code. Funtion way2obj::releasepoint() degrades 20% starting with r266305. Looking at perf output, the main difference seems to be that we're no longer shrink-wrapping the early exit test at the start of the function. Following is the annotated assembly of the start of the function. r266304: -------- 0000000010006a40 <_ZN7way2obj12releasepointEii>: /* way2obj::releasepoint(int, int) total: 2032811 22.9279 */ : 10006a40: lis r2,4098 : 10006a44: addi r2,r2,32512 95384 1.0758 : 10006a48: lwz r9,4424(r3) : 10006a4c: ld r8,8(r3) 119001 1.3422 : 10006a50: lhz r7,16(r3) 1 1.1e-05 : 10006a54: mullw r9,r9,r5 : 10006a58: add r9,r9,r4 : 10006a5c: extsw r9,r9 169526 1.9121 : 10006a60: rldicr r9,r9,2,61 : 10006a64: lhzx r10,r8,r9 21865 0.2466 : 10006a68: cmpw r10,r7 : 10006a6c: beqlr r266305: -------- 0000000010006a40 <_ZN7way2obj12releasepointEii>: /* way2obj::releasepoint(int, int) total: 2440798 26.2354 */ : 10006a40: lis r2,4098 : 10006a44: addi r2,r2,32512 35498 0.3816 : 10006a48: lwa r6,4424(r3) : 10006a4c: ld r7,8(r3) 26361 0.2833 : 10006a50: std r30,-16(r1) : 10006a54: mr r30,r3 157660 1.6946 : 10006a58: mfcr r12 162000 1.7413 : 10006a5c: lhz r3,16(r3) 17 1.8e-04 : 10006a60: std r23,-72(r1) 139 0.0015 : 10006a64: mr r23,r4 2 2.1e-05 : 10006a68: mullw r9,r6,r5 59 6.3e-04 : 10006a6c: stw r12,8(r1) 244832 2.6316 : 10006a70: stdu r1,-112(r1) 4 4.3e-05 : 10006a74: add r9,r9,r4 5 5.4e-05 : 10006a78: extsw r9,r9 201 0.0022 : 10006a7c: rldicr r8,r9,2,61 343 0.0037 : 10006a80: add r4,r7,r8 9 9.7e-05 : 10006a84: lhzx r10,r7,r8 151595 1.6294 : 10006a88: cmpw r10,r3 : 10006a8c: beq 10006c64 <_ZN7way2obj12releasepointEii+0x224> The target of the conditional branch in the slow version is just the epilogue code to restore R1, R23, R30 and CR3/CR4 and return.