https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98673
Bug ID: 98673 Summary: pass fre4 inhibit pass dom3 to create much more optimized code Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rjiejie at me dot com Target Milestone: --- Created attachment 49962 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49962&action=edit bug test file a, compiler option: cc1 -mabi=lp64d -march=rv64gc -O2 -S b, hot code in function t_run_test: j .L30 .L39: mv a4,a3 .L30: ld a2,8(a5) addi a3,a4,1 slli t3,a4,3 ble a2,a1,.L28 ld t5,0(a5) bge a1,t5,.L50 .L28: addi a5,a5,8 bne a3,a0,.L39 : hot code loop to .L39 better code in version 8.4 with same compiler option: ===================================================== .L30: ld t1,8(a4) slli a7,a5,3 ble t1,a3,.L28 ld t4,0(a4) bge a3,t4,.L50 .L28: addi a5,a5,1 addi a4,a4,8 bne a5,t3,.L30 : hot code loop to .L30 v10.2.0 gcc has more one instruction than v8.4.0. analize gcc pass of source code in v10.2.0: =========================================== before pass fr4: ---------------- <bb 8> [local count: 82176881]: engLoad.11_20 = engLoad; loadValue.13_26 = loadValue; _410 = (unsigned long) numXEntries.17_218; _409 = _410 + 18446744073709551615; _408 = (long int) _409; ... ... <bb 12> [local count: 986782143]: i1_174 = i1_6 + 1; if (i1_174 != _408) goto <bb 9>; [94.50%] else goto <bb 13>; [5.50%] <bb 13> [local count: 54273018]: # i1_420 = PHI <i1_174(12)> _433 = (long unsigned int) i1_420; _434 = _433 + 1; _435 = _434 * 8; _436 = i1_420 + 1; _440 = _435 - 8; _442 = engLoad.11_20 + _440; goto <bb 15>; [100.00%] after pass fr4: --------------- <bb 8> [local count: 82176881]: engLoad.11_20 = engLoad; loadValue.13_26 = loadValue; _410 = (unsigned long) numXEntries.17_218; _409 = _410 + 18446744073709551615; ... ... <bb 12> [local count: 986782143]: i1_174 = i1_6 + 1; if (i1_174 != _213) goto <bb 9>; [94.50%] else goto <bb 13>; [5.50%] <bb 13> [local count: 54273018]: _433 = (long unsigned int) i1_174; _434 = _433 + 1; _435 = _434 * 8; _436 = i1_174 + 1; _440 = _435 - 8; _442 = engLoad.11_20 + _440; goto <bb 15>; [100.00%] pass fr4 remove 'Removing dead stmt _408 = (long int) _409;', pass dom3 can't optimize this <bb 13> about '_433 = (long unsigned int) i1_174;' if <bb 13> use i1_174 node same as <bb 12>, so that conflict will be happened in pass expand on processing coalesced ssa/phi nodes, and then will split edge. need help ....:)