Hello, at the moment, any pass that needs to process memory references are complicated (or restricted to handling just a limited set of cases) by the need to interpret the quite complex representation of memory references that we have in gimple. For example, there are about 1000 of lines of quite convoluted code in tree-data-ref.c and about 500 lines in tree-ssa-loop-ivopts.c dealing with parsing and analysing memory references.
I would like to propose (and possibly also work on implementing) using a more uniform representation, described in more details below. The proposal is based on the previous discussions (http://gcc.gnu.org/ml/gcc/2006-06/msg00295.html) and on what I learned about the way memory references are represented in ICC. It also subsumes the patches of Daniel to make p[i] (where p is pointer) use ARRAY_REFs/MEM_REFs. I am not sure whether someone does not already work on something similar, e.g. with respect to LTO (where the mentioned discussion started), or gimple-tuples branch? Proposal: For each memory reference, we remember the following information: -- base of the reference -- constant offset -- vector of indices -- type of the accessed location -- original tree of the memory reference (or another summary of the structure of the access, for aliasing purposes) -- flags for each index, we remeber -- lower and upper bound -- step -- value of the index The address of the reference can be computed as base + offset + sum_{idx} offsetof(idx) where offsetof(idx) = (value - lower) * step For example, a.x[i][j].y would be represented as base = &a offset = offsetof (x) + offsetof (y); indices: lower = 0 upper = ? step = sizeof (a.x[i]) value = i lower = 0 upper = ? step = sizeof (a.x[j]) value = j p->x would be represented as base = p; offset = offsetof (x); indices: empty p[i] as base = p; offset = 0 indices: lower = 0 upper = ? step = sizeof (p[i]) value = i Remarks: -- it would be guaranteed that the indices of each memory reference are independent, i.e., that &ref[idx1][idx2] == &ref[idx1'][idx2'] only if idx1 == idx1' and idx2 = idx2'; this is important for dependency analysis (and for this reason we also need to remember the list of indices, rather than trying to reconstruct them from the address). -- it would be guaranteed that the computations of the address do not overflow. -- possibly there could be a canonicalization pass that, given for (p = q; p < q + 100; p++) p->x = ... {represented the way described above} would transform it to for (p = q, i = 0; p < q + 100; p++, i++) {base = q, offset = offsetof(x), indices: lower = 0 upper = ? step = sizeof (*p) value = i} so that dependence analysis and friends do not have to distinguish between accesses through pointers and arrays Zdenek