https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81611
Jeffrey A. Law <law at redhat dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |law at redhat dot com --- Comment #2 from Jeffrey A. Law <law at redhat dot com> --- So we get good code just prior to this: Author: rguenth <rguenth@138bc75d-0d04-0410-961f-82ee72b054a4> Date: Tue Apr 21 12:52:43 2015 +0000 2015-04-21 Richard Biener <rguent...@suse.de> PR tree-optimization/65650 * tree-ssa-ccp.c (valid_lattice_transition): Allow lattice transitions involving copies. (set_lattice_value): Adjust for copy lattice state. [ ... ] That change results in propagation of a copy (no surprise there). This results in IVopts making some different choices. Prior to the change the loop looks like this in the .optimized dump: # x_1 = PHI <x_4(D)(2), x_13(3)> # ivtmp.7_16 = PHI <ivtmp.7_7(2), ivtmp.7_15(3)> str_2 = (char *) ivtmp.7_16; _9 = x_1 & 1; _10 = _9 + 48; _11 = (char) _10; MEM[base: str_2, offset: 0B] = _11; ivtmp.7_15 = ivtmp.7_16 + 1; x_13 = x_1 >> 1; if (x_13 != 0) goto <bb 3>; else goto <bb 4>; Intuitively we can see the relationship between STR and IVTMP and the likely post-inc opportunity at the memory reference and subsequent increment of IVTMP. If we look at the loop after the referenced change we have: # x_1 = PHI <x_4(D)(2), x_13(3)> # str_2 = PHI <str_5(D)(2), str_8(3)> str_8 = str_2 + 1; _9 = x_1 & 1; _10 = _9 + 48; _11 = (char) _10; _16 = str_8 + 65535; MEM[base: _16, offset: 0B] = _11; x_13 = x_1 >> 1; if (x_13 != 0) goto <bb 3>; else goto <bb 4>; So we no longer have the IV, just STR and it's a lot harder to recover the auto-inc opportunity at the memory reference. Anyway, that's the point where it looks to me like things start to go off the rails. If we walk forward to the trunk today and look at the .expand dump we have: # x_5 = PHI <x_8(D)(2), x_15(3)> # str_6 = PHI <str_9(D)(2), str_17(3)> _1 = x_5 & 1; _2 = _1 + 48; str_11 = str_6 + 1; _4 = (char) _2; _16 = str_6; MEM[base: _16, offset: 0B] = _4; x_13 = x_5 >> 1; x_15 = x_13; str_17 = str_11; if (x_5 > 1) goto <bb 3>; [85.00%] else goto <bb 4>; [15.00%] ;; basic block 4, loop depth 0 ;; pred: 3 MEM[(char *)str_6 + 1B] = 0; return; I think part of the problem is that we need str_6 and str_11 -- they have different values and conflict. The two MEMs could potentially be rewritten in terms of str_11. With the obvious copy-props we'd have something like this: # x_5 = PHI <x_8(D)(2), x_13(3)> # str_6 = PHI <str_9(D)(2), str_11(3)> _1 = x_5 & 1; _2 = _1 + 48; str_11 = str_6 + 1; _4 = (char) _2; MEM[base: str_11, offset: -1B] = _4; x_13 = x_5 >> 1; if (x_5 > 1) goto <bb 3>; [85.00%] else goto <bb 4>; [15.00%] ;; basic block 4, loop depth 0 ;; pred: 3 MEM[(char *)str_11, offset: 0B] = 0; return; That ought to allow str_6 and str_11 to coalesce. The question then becomes can we recover the auto-inc -- I'm not sure the auto-inc code is good enough to see it in that form. Most importantly, while this BZ is filed against the AVR target, it seems to me to be clearly a generic issue.