http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52082
Bug #: 52082 Summary: Memory loads not rematerialized Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Keywords: missed-optimization, ra Severity: normal Priority: P3 Component: middle-end AssignedTo: unassig...@gcc.gnu.org ReportedBy: ja...@gcc.gnu.org CC: vmaka...@gcc.gnu.org Target: x86_64-linux On the following testcase at -O2 (distilled from genautomata.c): struct S { unsigned long *s1; struct S *s2; }; int v1 __attribute__((visibility ("hidden"))); struct T { int a, b, c; } *v2 __attribute__((visibility ("hidden"))); struct S **v3 __attribute__((visibility ("hidden"))); struct S **v4 __attribute__((visibility ("hidden"))); int __attribute__((noinline, noclone)) foo (unsigned long *x, unsigned long *y, int z) { int j, k, l; unsigned int i; struct S *m; for (j = 0; j < v1; j++) if (y[j]) for (i = 0; i < 8 * sizeof (unsigned long); i++) if ((y[j] >> i) & 1) { k = j * 8 * sizeof (unsigned long) + i; if (k >= v2->c) break; for (m = (z ? v4 [k] : v3 [k]); m != ((void *)0); m = m->s2) { for (l = 0; l < v1; l++) if ((x [l] & m->s1 [l]) != m->s1 [l] && m->s1 [l]) break; if (l >= v1) return 0; } } return 1; } tree LIM moves the loads from v2/v3/v4 before the loop, but unfortunately the register pressure is high and the pseudos holding the v3/v4 pointers don't get a a hard register and are immediately spilled to the stack. I wonder whether we couldn't instead just rematerialize them and put the original MEM loads into the loop (assuming they don't alias with anything on the way, but that must be the case here when LIM moved them there first, after all this loop doesn't have any MEM stores at all).