https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66642
Bug ID: 66642 Summary: transform_to_exit_first_loop_alt doesn't use result of low iteration count loop Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- Created attachment 35831 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35831&action=edit patch to produce test case Using attached patch, we exercise the low iteration count loop generated by the parloops pass. The libgomp.c/parloops-exit-first-loop-alt-3.c testcase fails: ... PASS: libgomp.c/parloops-exit-first-loop-alt-2.c (test for excess errors) PASS: libgomp.c/parloops-exit-first-loop-alt-2.c execution test PASS: libgomp.c/parloops-exit-first-loop-alt-3.c (test for excess errors) FAIL: libgomp.c/parloops-exit-first-loop-alt-3.c execution test PASS: libgomp.c/parloops-exit-first-loop-alt-4.c (test for excess errors) PASS: libgomp.c/parloops-exit-first-loop-alt-4.c execution test PASS: libgomp.c/parloops-exit-first-loop-alt.c (test for excess errors) PASS: libgomp.c/parloops-exit-first-loop-alt.c execution test ... The problem is the following. Before transform_to_exit_first_loop, we have loop header bb4, loop latch bb6, and loop exit bb5: ... <bb 4>: # sum_17 = PHI <1(11), sum_11(6)> # ivtmp_24 = PHI <0(11), ivtmp_6(6)> i_16 = (int) ivtmp_24; _7 = (long unsigned int) i_16; _8 = _7 * 4; _9 = pretmp_23 + _8; _10 = *_9; sum_11 = _10 + sum_17; i_12 = i_16 + 1; i.1_3 = (unsigned int) i_12; if (ivtmp_24 < _19) goto <bb 6>; else goto <bb 5>; <bb 5>: # sum_20 = PHI <sum_11(4), sum_25(8)> goto <bb 7>; <bb 6>: ivtmp_6 = ivtmp_24 + 1; goto <bb 4>; ... After transform_to_exit_first_loop, we still have loop header bb4 and loop latch bb6, but the loop exit is now bb14: ... <bb 4>: # sum_27 = PHI <1(11), sum_11(6)> # ivtmp_28 = PHI <0(11), ivtmp_6(6)> if (ivtmp_28 < _19) goto <bb 13>; else goto <bb 14>; <bb 13>: # sum_17 = PHI <sum_27(4)> # ivtmp_24 = PHI <ivtmp_28(4)> i_16 = (int) ivtmp_24; _7 = (long unsigned int) i_16; _8 = _7 * 4; _9 = pretmp_23 + _8; _10 = *_9; sum_11 = _10 + sum_17; i_12 = i_16 + 1; i.1_3 = (unsigned int) i_12; goto <bb 6>; <bb 14>: # sum_29 = PHI <sum_27(4)> ivtmp_30 = _19; i_31 = (int) ivtmp_30; _32 = (long unsigned int) i_31; _33 = _32 * 4; _34 = pretmp_23 + _33; _35 = *_34; sum_36 = _35 + sum_29; i_37 = i_31 + 1; i.1_38 = (unsigned int) i_37; <bb 5>: # sum_20 = PHI <sum_36(14), sum_25(8)> goto <bb 7>; <bb 6>: ivtmp_6 = ivtmp_24 + 1; goto <bb 4>; ... A bit later, separate_decls_in_region inserts a .paral_data_store based load in the new exit block, assuming that the exit block has a single predecessor (the loop header bb4): ... <bb 14>: .paral_data_load.11_42 = &.paral_data_store.10; sum_29 = .paral_data_load.11_42->sum.7; ivtmp_30 = _19; i_31 = (int) ivtmp_30; _32 = (long unsigned int) i_31; _33 = _32 * 4; _34 = pretmp_23 + _33; _35 = *_34; sum_36 = _35 + sum_29; i_37 = i_31 + 1; i.1_38 = (unsigned int) i_37; ... However, with transform_to_exit_first_loop_alt we keep loop latch bb6 and loop exit bb5, but we get a new loop header bb13: ... <bb 11>: goto <bb 13>; <bb 4>: # sum_17 = PHI <sum_27(13)> # ivtmp_24 = PHI <ivtmp_28(13)> i_16 = (int) ivtmp_24; _7 = (long unsigned int) i_16; _8 = _7 * 4; _9 = pretmp_23 + _8; _10 = *_9; sum_11 = _10 + sum_17; i_12 = i_16 + 1; i.1_3 = (unsigned int) i_12; goto <bb 6>; <bb 13>: # sum_27 = PHI <sum_11(6), 1(11)> # ivtmp_28 = PHI <ivtmp_6(6), 0(11)> if (ivtmp_28 < n_4(D)) goto <bb 4>; else goto <bb 5>; <bb 5>: # sum_20 = PHI <sum_27(13), sum_25(8)> goto <bb 7>; ... The loop exit bb5 is also reached from bb8, the loop header of the low iteration count loop. So when separate_decls_in_region inserts a .paral_data_store based load in the exit block, it destroys the value coming from the low iteration count loop: ... <bb 5>: .paral_data_load.12_32 = &.paral_data_store.11; sum_20 = .paral_data_load.12_32->sum.7; goto <bb 7>; ... The fix is probably to make sure that we split the exit edge during transform_to_exit_first_loop_alt.