https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110807
Alexandre Oliva <aoliva at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |aoliva at gcc dot gnu.org --- Comment #11 from Alexandre Oliva <aoliva at gcc dot gnu.org> --- > The new test fails with -m32 I've looked a bit into why. The memmove is optimized out by vrp (or, if that's disabled, by dom) on lp64, because it's guarded by two conditions: _10 > sizeof(long), and !(_14 > 1), where _10 is a signed long (ptrdiff_t) computed as the difference between the _M_p of _M_finish and _M_start in the preexisting vector, and _14 = (unsigned long)(_10*8 + _8), where _8 is the vector's finish offset. in order for the _14 condition to hold, _14 must be 0ul..1ul. Since _10 is long, _8 promotes to long in lp64, the addition is performed as a signed long, and then converted to unsigned long. _8 is loaded from memory as an unsigned int, and nothing is known about it, so its promoted operand is 0l..0xffffffffl. In order for _14 to be <= 1ul, _10 * 8 must be in the range -0xffffffffl..1l, and therefore _10 must be <= 0x1fffffffl..0l, which enables folding of the _10 condition as the entire range is <= sizeof(long). In the lp32 case, _10 is int, so _10*8 promotes to unsigned int for the addition, whose result is then NOPped to unsigned long. _8 is also loaded from memory as unsigned int, but because unsigned addition wraps around and _8 covers the full range, nothing can be inferred about the range of _10*8, and thus _10's range is only limited by overflow-avoidance in the signed multiplication: -0x1fffffffl..0x1ffffffl. Therefore, the _10 compare cannot be folded, and the memmove call remains. I think the missed optimization and the overall problem stems from the fact that optimizers don't know the actual range of _M_offset. Ensuring it's visibly normalized at uses in which out-of-range _M_offsets might sneak in might be enough to avoid the warning and enable further optimizations.