Modeless const_ints have been a recurring source of problems.  The idea
behind keeping them modeless was presumably that we wanted to allow
const_ints to be shared between modes.  However, in practice, every
"real" const_int does have a conceptual mode, and the sign-extension
rules mean that something like 0x8000 can't be shared directly as an
HI and SI integer.

Others have referred to modeless const_ints as an historical mistake,
so I'd like to try to moving to const_ints with mode.  The first stage
of the plan is to replace calls to GEN_INT with calls to gen_int_mode.

Unfortunately, const_ints are sometimes used as a way of storing an
optional HOST_WIDE_INT (i.e. None | Some n).  These sorts of const_int
don't really represent rtl expressions, and so don't really have a
natural mode.

This patch deals with one such use of const_ints: MEM_SIZE and MEM_OFFSET.
Nothing might come of the wider grand plan -- and even if it does, it might
be rejected as a bad idea -- but I think MEM_SIZE and MEM_OFFSET are
worth changing regardless.

MEM_SIZE is defined as:

  /* For a MEM rtx, the size in bytes of the MEM, if known, as an RTX that
     is always a CONST_INT.  */
  #define MEM_SIZE(RTX)                                                   \
  (MEM_ATTRS (RTX) != 0 ? MEM_ATTRS (RTX)->size                           \
   : GET_MODE (RTX) != BLKmode ? GEN_INT (GET_MODE_SIZE (GET_MODE (RTX))) \
   : 0)

But it seems like a bad idea to have GEN_INT embedded in such an
inocuous-looking macro.  The typical use case is to test whether
MEM_SIZE is null, then extract its INTVAL:

  sizex = (!MEM_P (rtlx) ? (int) GET_MODE_SIZE (GET_MODE (rtlx))
           : MEM_SIZE (rtlx) ? INTVAL (MEM_SIZE (rtlx))
           : -1);

which in the attribute-less case means two pointless calls to GEN_INT.
Loops like:

        for (byte = off; byte < off + INTVAL (MEM_SIZE (mem)); byte++)

are not necessarily as cheap as they might seem.  The same applies
to the less-frequently-used MEM_OFFSET.

One fix might be to give every MEM a mem_attrs structure.  MEMs without
them are pretty rare these days anyway.  However, various parts of
the compiler change the mode in-place, so that would need a bit
more surgery.  I don't really want to do something so potentially
invasive.  (At the same time, I don't want to make it harder to
do that in future.)

Patch 1 instead adds a global mem_attrs for each mode.  It also adds
a function get_mem_attrs that always returns an attributes structure,
using the new array where necessary.

Patch 2 uses gen_mem_attrs to simplify the internals of emit-rtl.c,
and to make it easier to change mem_attrs in future.

As it happens, nothing really seems to want MEM_SIZE or MEM_OFFSET
as an rtx.  All users seem to go straight to the INTVAL. Patches 3 and 4
therefore change the interface so that:

   MEM_SIZE_KNOWN_P (x)
   MEM_OFFSET_KNOWN_P (x)

says whether the MEM_SIZE and MEM_OFFSET are known, while MEM_SIZE
and MEM_OFFSET give their values as HOST_WIDE_INTs.

Finally, patch 5 actually changes the mem_attrs representation.  This
might or might not be considered a Good Thing; I'll discuss it a bit
more in the patch's covering message.  I'd be happy for just patches
1-4 to go in at this stage if that seems better.

I wondered whether there should be special HOST_WIDE_INT values to mean
"no size known" or "no offset known".  One idea was to have an offset
of -1 mean "not known", but negative offsets are apparently acceptable
in some cases:

  /* If the base decl is a parameter we can have negative MEM_OFFSET in
     case of promoted subregs on bigendian targets.  Trust the MEM_EXPR
     here.  */
  if (INTVAL (MEM_OFFSET (mem)) < 0
      && ((INTVAL (MEM_SIZE (mem)) + INTVAL (MEM_OFFSET (mem)))
          * BITS_PER_UNIT) == ref->size)
    return true;

We could instead use the minimum HOST_WIDE_INT as the offset.  But that
comes back to Joseph's point (from other threads) that, while C doesn't
really cope properly with objects that are bigger than half the address
space, glibc does actually allow such objects.  The same concern applies
to MEM_SIZE.

The fact that most users treat the size as signed rather than unsigned
might mean we don't always cope properly with larger objects in GCC.
Even so, it seemed better to avoid any such assumptions in the
general case, hence the MEM_*_KNOWN_P stuff above.

Tested on x86_64-linux-gnu (all languages).  Also tested by compiling
gcc and g++ on:

        arm-linux-gnueabi
        h8300-elf
        x86_64-linux-gnu
        mips-linux-gnu
        powerpc-linux-gnu
        s390-linux-gnu
        sh-elf

and making sure that there were no changes in the assembly generated
for gcc.dg, g++.dg and gcc.c-torture.

Richard

Reply via email to