http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47653
Jeffrey A. Law <law at redhat dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|P3 |P2 CC| |law at redhat dot com AssignedTo|unassigned at gcc dot |spop at gcc dot gnu.org |gnu.org | --- Comment #2 from Jeffrey A. Law <law at redhat dot com> 2011-02-18 15:36:46 UTC --- This appears to be a bug in the graphite transformations. Prior to graphite transformations we have the following function with two nested loops: # BLOCK 2 freq:139 # PRED: ENTRY [100.0%] (fallthru,exec) # .MEM_13 = VDEF <.MEM_12(D)> saved_stack.1_3 = __builtin_stack_save (); # .MEM_14 = VDEF <.MEM_13> x.0_4 = __builtin_alloca (256); goto <bb 8>; # SUCC: 8 [100.0%] (fallthru,exec) # BLOCK 3 freq:7901 # PRED: 4 [88.9%] (true,exec) # SUCC: 4 [100.0%] (fallthru,exec) # BLOCK 4 freq:8889 # PRED: 3 [100.0%] (fallthru,exec) 9 [100.0%] (fallthru,exec) # j_23 = PHI <j_7(3), j_5(9)> # .MEM_24 = PHI <.MEM_15(3), .MEM_6(9)> # .MEM_15 = VDEF <.MEM_24> *x.0_4[j_5][j_23] = 0; j_7 = j_23 + 1; if (j_7 <= 7) goto <bb 3>; else goto <bb 5>; # SUCC: 3 [88.9%] (true,exec) 5 [11.1%] (false,exec) # BLOCK 5 freq:988 # PRED: 4 [11.1%] (false,exec) # .MEM_19 = PHI <.MEM_15(4)> # SUCC: 6 [100.0%] (fallthru,exec) # BLOCK 6 freq:1111 # PRED: 5 [100.0%] (fallthru,exec) 8 [11.1%] (false,exec) # .MEM_25 = PHI <.MEM_19(5), .MEM_6(8)> i_8 = j_5 + 1; if (i_8 <= 7) goto <bb 7>; else goto <bb 10>; # SUCC: 7 [88.9%] (true,exec) 10 [11.1%] (false,exec) # BLOCK 7 freq:988 # PRED: 6 [88.9%] (true,exec) # SUCC: 8 [100.0%] (fallthru,exec) # BLOCK 8 freq:1111 # PRED: 7 [100.0%] (fallthru,exec) 2 [100.0%] (fallthru,exec) # j_5 = PHI <i_8(7), 0(2)> # .MEM_6 = PHI <.MEM_25(7), .MEM_14(2)> if (j_5 <= 7) goto <bb 9>; else goto <bb 6>; # SUCC: 9 [88.9%] (true,exec) 6 [11.1%] (false,exec) # BLOCK 9 freq:988 # PRED: 8 [88.9%] (true,exec) goto <bb 4>; # SUCC: 4 [100.0%] (fallthru,exec) # BLOCK 10 freq:139 # PRED: 6 [11.1%] (false,exec) # .MEM_16 = VDEF <.MEM_25> __builtin_stack_restore (saved_stack.1_3); return 0; # SUCC: EXIT [100.0%] Pretty simple stuff. Graphite transforms it into: # BLOCK 2 freq:313 # PRED: ENTRY [100.0%] (fallthru,exec) # .MEM_13 = VDEF <.MEM_12(D)> saved_stack.1_3 = __builtin_stack_save (); # .MEM_14 = VDEF <.MEM_13> x.0_4 = __builtin_alloca (256); # SUCC: 3 [100.0%] (fallthru) # BLOCK 3 freq:2500 # PRED: 2 [100.0%] (fallthru) 8 [100.0%] (fallthru,dfs_back) # graphite_IV.5_11 = PHI <0(2), graphite_IV.5_1(8)> # .MEM_33 = PHI <.MEM_14(2), .MEM_34(8)> D.2701_22 = 8 - graphite_IV.5_11; D.2702_21 = D.2701_22 > 0; if (D.2702_21 != 0) goto <bb 4>; else goto <bb 7>; # SUCC: 4 [50.0%] (true) 7 [50.0%] (false) # BLOCK 4 freq:1250 # PRED: 3 [50.0%] (true) D.2704_10 = (<unnamed-signed:64>) graphite_IV.5_11; D.2705_2 = D.2704_10 * 4294967295; D.2706_18 = D.2705_2 + 7; # SUCC: 5 [100.0%] (fallthru) # BLOCK 5 freq:10000 # PRED: 4 [100.0%] (fallthru) 6 [100.0%] (fallthru,dfs_back) # graphite_IV.6_17 = PHI <0(4), graphite_IV.6_26(6)> # .MEM_35 = PHI <.MEM_33(4), .MEM_27(6)> D.2707_28 = (int) graphite_IV.5_11; D.2708_29 = (int) graphite_IV.5_11; D.2709_30 = (int) graphite_IV.6_17; D.2710_31 = D.2708_29 + D.2709_30; # .MEM_27 = VDEF <.MEM_35> *x.0_4[D.2707_28][D.2710_31] = 0; graphite_IV.6_26 = graphite_IV.6_17 + 1; if (graphite_IV.6_17 < D.2706_18) goto <bb 6>; else goto <bb 7>; # SUCC: 6 [87.5%] (true) 7 [12.5%] (loop_exit,false) # BLOCK 6 freq:8750 # PRED: 5 [87.5%] (true) goto <bb 5>; # SUCC: 5 [100.0%] (fallthru,dfs_back) # BLOCK 7 freq:2500 # PRED: 5 [12.5%] (loop_exit,false) 3 [50.0%] (false) # .MEM_34 = PHI <.MEM_27(5), .MEM_33(3)> graphite_IV.5_1 = graphite_IV.5_11 + 1; if (graphite_IV.5_11 < 7) goto <bb 8>; else goto <bb 9>; # SUCC: 8 [87.5%] (true) 9 [12.5%] (loop_exit,false) # BLOCK 8 freq:2188 # PRED: 7 [87.5%] (true) goto <bb 3>; # SUCC: 3 [100.0%] (fallthru,dfs_back) # BLOCK 9 freq:313 # PRED: 7 [12.5%] (loop_exit,false) # .MEM_32 = PHI <.MEM_34(7)> # .MEM_16 = VDEF <.MEM_32> __builtin_stack_restore (saved_stack.1_3); return 0; # SUCC: EXIT [100.0%] Of particular interest is the assignment to D.2705_2 in block #4: D.2705_2 = D.2704_10 * 4294967295; D.2706_18 = D.2705_2 + 7; That makes absolutely no sense. Particularly since D.2706_18 is later used to control loop termination in BB5. Note D.2706 is a 64bit type, so we really are multiplying by 4294967295. Needless to say this causes the loop termination condition to do something different than was originally intended.