On 4/4/07, Zdenek Dvorak <[EMAIL PROTECTED]> wrote:
Hello,
at the moment, any pass that needs to process memory references are
complicated (or restricted to handling just a limited set of cases) by
the need to interpret the quite complex representation of memory
references that we have in gimple. For example, there are about 1000 of
lines of quite convoluted code in tree-data-ref.c and about 500 lines in
tree-ssa-loop-ivopts.c dealing with parsing and analysing memory
references.
I would like to propose (and possibly also work on implementing) using a
more uniform representation, described in more details below. The
proposal is based on the previous discussions
(http://gcc.gnu.org/ml/gcc/2006-06/msg00295.html) and on what I learned
about the way memory references are represented in ICC.
It also subsumes the patches of Daniel to make p[i] (where p is pointer)
use ARRAY_REFs/MEM_REFs.
I am not sure whether someone does not already work on something
similar, e.g. with respect to LTO (where the mentioned discussion
started), or gimple-tuples branch?
Proposal:
For each memory reference, we remember the following information:
-- base of the reference
-- constant offset
-- vector of indices
-- type of the accessed location
-- original tree of the memory reference (or another summary of the
structure of the access, for aliasing purposes)
-- flags
for each index, we remeber
-- lower and upper bound
-- step
-- value of the index
The address of the reference can be computed as
base + offset + sum_{idx} offsetof(idx)
where offsetof(idx) = (value - lower) * step
For example, a.x[i][j].y would be represented as
base = &a
offset = offsetof (x) + offsetof (y);
indices:
lower = 0 upper = ? step = sizeof (a.x[i]) value = i
lower = 0 upper = ? step = sizeof (a.x[j]) value = j
p->x would be represented as
base = p;
offset = offsetof (x);
indices: empty
p[i] as
base = p;
offset = 0
indices:
lower = 0 upper = ? step = sizeof (p[i]) value = i
Remarks:
-- it would be guaranteed that the indices of each memory reference are
independent, i.e., that &ref[idx1][idx2] == &ref[idx1'][idx2'] only
if idx1 == idx1' and idx2 = idx2'; this is important for dependency
analysis (and for this reason we also need to remember the list of
indices, rather than trying to reconstruct them from the address).
-- it would be guaranteed that the computations of the address do not
overflow.
-- possibly there could be a canonicalization pass that, given
for (p = q; p < q + 100; p++)
p->x = ... {represented the way described above}
would transform it to
for (p = q, i = 0; p < q + 100; p++, i++)
{base = q, offset = offsetof(x), indices: lower = 0 upper = ? step =
sizeof (*p) value = i}
so that dependence analysis and friends do not have to distinguish
between accesses through pointers and arrays
This looks like a very complicated (though very generic) way of
specifying a memory
reference. Last time we discussed this I proposed to just have BASE, OFFSET
and accessed TYPE (and an alias tag of the memory reference). I realize this
doesn't cover accesses to multi-dimensional arrays, but using the
full-blown scheme
in place of a simple INDIRECT_REF makes memory usage for the trees go up
noticable.
Richard.