Proposal: changing representation of memory references

Zdenek Dvorak Wed, 04 Apr 2007 07:35:20 -0700

Hello,

at the moment, any pass that needs to process memory references are
complicated (or restricted to handling just a limited set of cases) by
the need to interpret the quite complex representation of memory
references that we have in gimple.  For example, there are about 1000 of
lines of quite convoluted code in tree-data-ref.c and about 500 lines in
tree-ssa-loop-ivopts.c dealing with parsing and analysing memory
references.


I would like to propose (and possibly also work on implementing) using a
more uniform representation, described in more details below.  The
proposal is based on the previous discussions
(http://gcc.gnu.org/ml/gcc/2006-06/msg00295.html) and on what I learned
about the way memory references are represented in ICC.
It also subsumes the patches of Daniel to make p[i] (where p is pointer)
use ARRAY_REFs/MEM_REFs.

I am not sure whether someone does not already work on something
similar, e.g. with respect to LTO (where the mentioned discussion
started), or gimple-tuples branch?

Proposal:

For each memory reference, we remember the following information:

-- base of the reference
-- constant offset
-- vector of indices
-- type of the accessed location
-- original tree of the memory reference (or another summary of the
      structure of the access, for aliasing purposes)
-- flags

for each index, we remeber
-- lower and upper bound
-- step
-- value of the index

The address of the reference can be computed as

base + offset + sum_{idx} offsetof(idx)

where offsetof(idx) = (value - lower) * step

For example, a.x[i][j].y  would be represented as
base = &a
offset = offsetof (x) + offsetof (y);
indices:
  lower = 0 upper = ? step = sizeof (a.x[i]) value = i
  lower = 0 upper = ? step = sizeof (a.x[j]) value = j

p->x would be represented as
base = p;
offset = offsetof (x);
indices: empty

p[i] as
base = p;
offset = 0
indices: 
  lower = 0 upper = ? step = sizeof (p[i]) value = i

Remarks:
-- it would be guaranteed that the indices of each memory reference are
   independent, i.e., that &ref[idx1][idx2] == &ref[idx1'][idx2'] only
   if idx1 == idx1' and idx2 = idx2'; this is important for dependency
   analysis (and for this reason we also need to remember the list of
   indices, rather than trying to reconstruct them from the address).
-- it would be guaranteed that the computations of the address do not
   overflow.
-- possibly there could be a canonicalization pass that, given

   for (p = q; p < q + 100; p++)
     p->x = ...  {represented the way described above}

   would transform it to 

   for (p = q, i = 0; p < q + 100; p++, i++)
     {base = q, offset = offsetof(x), indices: lower = 0 upper = ? step = 
sizeof (*p) value = i}

   so that dependence analysis and friends do not have to distinguish
   between accesses through pointers and arrays

Zdenek

Proposal: changing representation of memory references

Reply via email to