Modeless const_ints have been a recurring source of problems. The idea behind keeping them modeless was presumably that we wanted to allow const_ints to be shared between modes. However, in practice, every "real" const_int does have a conceptual mode, and the sign-extension rules mean that something like 0x8000 can't be shared directly as an HI and SI integer.
Others have referred to modeless const_ints as an historical mistake, so I'd like to try to moving to const_ints with mode. The first stage of the plan is to replace calls to GEN_INT with calls to gen_int_mode. Unfortunately, const_ints are sometimes used as a way of storing an optional HOST_WIDE_INT (i.e. None | Some n). These sorts of const_int don't really represent rtl expressions, and so don't really have a natural mode. This patch deals with one such use of const_ints: MEM_SIZE and MEM_OFFSET. Nothing might come of the wider grand plan -- and even if it does, it might be rejected as a bad idea -- but I think MEM_SIZE and MEM_OFFSET are worth changing regardless. MEM_SIZE is defined as: /* For a MEM rtx, the size in bytes of the MEM, if known, as an RTX that is always a CONST_INT. */ #define MEM_SIZE(RTX) \ (MEM_ATTRS (RTX) != 0 ? MEM_ATTRS (RTX)->size \ : GET_MODE (RTX) != BLKmode ? GEN_INT (GET_MODE_SIZE (GET_MODE (RTX))) \ : 0) But it seems like a bad idea to have GEN_INT embedded in such an inocuous-looking macro. The typical use case is to test whether MEM_SIZE is null, then extract its INTVAL: sizex = (!MEM_P (rtlx) ? (int) GET_MODE_SIZE (GET_MODE (rtlx)) : MEM_SIZE (rtlx) ? INTVAL (MEM_SIZE (rtlx)) : -1); which in the attribute-less case means two pointless calls to GEN_INT. Loops like: for (byte = off; byte < off + INTVAL (MEM_SIZE (mem)); byte++) are not necessarily as cheap as they might seem. The same applies to the less-frequently-used MEM_OFFSET. One fix might be to give every MEM a mem_attrs structure. MEMs without them are pretty rare these days anyway. However, various parts of the compiler change the mode in-place, so that would need a bit more surgery. I don't really want to do something so potentially invasive. (At the same time, I don't want to make it harder to do that in future.) Patch 1 instead adds a global mem_attrs for each mode. It also adds a function get_mem_attrs that always returns an attributes structure, using the new array where necessary. Patch 2 uses gen_mem_attrs to simplify the internals of emit-rtl.c, and to make it easier to change mem_attrs in future. As it happens, nothing really seems to want MEM_SIZE or MEM_OFFSET as an rtx. All users seem to go straight to the INTVAL. Patches 3 and 4 therefore change the interface so that: MEM_SIZE_KNOWN_P (x) MEM_OFFSET_KNOWN_P (x) says whether the MEM_SIZE and MEM_OFFSET are known, while MEM_SIZE and MEM_OFFSET give their values as HOST_WIDE_INTs. Finally, patch 5 actually changes the mem_attrs representation. This might or might not be considered a Good Thing; I'll discuss it a bit more in the patch's covering message. I'd be happy for just patches 1-4 to go in at this stage if that seems better. I wondered whether there should be special HOST_WIDE_INT values to mean "no size known" or "no offset known". One idea was to have an offset of -1 mean "not known", but negative offsets are apparently acceptable in some cases: /* If the base decl is a parameter we can have negative MEM_OFFSET in case of promoted subregs on bigendian targets. Trust the MEM_EXPR here. */ if (INTVAL (MEM_OFFSET (mem)) < 0 && ((INTVAL (MEM_SIZE (mem)) + INTVAL (MEM_OFFSET (mem))) * BITS_PER_UNIT) == ref->size) return true; We could instead use the minimum HOST_WIDE_INT as the offset. But that comes back to Joseph's point (from other threads) that, while C doesn't really cope properly with objects that are bigger than half the address space, glibc does actually allow such objects. The same concern applies to MEM_SIZE. The fact that most users treat the size as signed rather than unsigned might mean we don't always cope properly with larger objects in GCC. Even so, it seemed better to avoid any such assumptions in the general case, hence the MEM_*_KNOWN_P stuff above. Tested on x86_64-linux-gnu (all languages). Also tested by compiling gcc and g++ on: arm-linux-gnueabi h8300-elf x86_64-linux-gnu mips-linux-gnu powerpc-linux-gnu s390-linux-gnu sh-elf and making sure that there were no changes in the assembly generated for gcc.dg, g++.dg and gcc.c-torture. Richard