Jakub Jelinek <ja...@redhat.com> writes: > Hi! > > The following testcase ICEs, because during try_combine of i3: > (insn 18 17 19 2 (parallel [ > (set (reg:CCO 17 flags) > (eq:CCO (plus:OI (sign_extend:OI (reg:TI 96)) > (const_int 1 [0x1])) > (sign_extend:OI (plus:TI (reg:TI 96) > (const_int 1 [0x1]))))) > (set (reg:TI 98) > (plus:TI (reg:TI 96) > (const_int 1 [0x1]))) > ]) "pr93376.c":8:10 223 {*addvti4_doubleword_1} > (expr_list:REG_UNUSED (reg:TI 98) > (expr_list:REG_DEAD (reg:TI 96) > (nil)))) > and i2: > (insn 17 37 18 2 (set (reg:TI 96) > (const_wide_int 0x7fffffffffffffffffffffffffffffff)) "pr93376.c":8:10 > 65 {*movti_internal} > (nil)) > the eq in there gets simplified into: > (eq:CCO (const_wide_int 0x080000000000000000000000000000000) > (const_wide_int 0x80000000000000000000000000000000)) > and simplify-rtx.c tries to simplify it by simplifying MINUS > of the two operands. > Now, i386 defines MAX_BITSIZE_MODE_ANY_INT to 128, because OImode > and XImode are used mainly as a placeholder for the vector modes; > these new signed overflow patterns are an exception to that, > but what they really need is just TImode precision + 1 (maybe 2 worst case) > bits at any time. > > wide-int.h defines WIDE_INT_MAX_ELTS in a way that it contains one more > HWI above number of HWIs to cover WIDE_INT_MAX_ELTS, so on i386 that is > 3 HWIs, meaning that TImode precision + 1/2 bits is still representable in > there. Unfortunately, the way wi::sub_large is implemented, it needs > not just those 3 HWIs, but one HWI above the maximum of the lengths of > both operands, which means it buffer overflows, overwrites the following > precision in wide_int_storage and ICEs later on. The need for 4 HWIs is > only temporary, because canonize immediately after it canonicalizes it > back to 3 HWIs only. > > Attached are two patches, the first one is admittedly a hack, but could be > considered an optimization too, simply before overwriting the last HWI > ({add,sub}_large seem to be the only one that have such len++) perform > a cheap check if canonize won't optimize it away immediately and in such > case don't store it. Yes, we are playing with fire because full OImode > is 256 bits and we can't store that many, but we in the end only need 129 > bits. > > The other patch is something suggested by Richard S., avoid using OImode > for this and instead use a partial int mode that is smaller. This is still > playing with fire because even the partial int mode is larger than > MAX_BITSIZE_MODE_ANY_INT, but at least its precision fits into those 3 > WIDE_INT_MAX_ELTS wide-int.h uses.
I think we should increase MAX_BITSIZE_MODE_ANY_INT to the precision of POImode at the same time, and make that precision less than 192 (191 maybe?). That way everything is "correct" without increasing the size of wide_int. > The disadvantage of that approach is that it is more costly at compile > time, various RA data structures contains number_of_modes^2 sized > arrays, and RA initialization will want to compute at runtime various > properties of each of the modes. True, but if that's a real source of slow compilation in practice then we should do something about it independently of this, since many mode combinations make no sense. If I've counted correctly, x86 already has 109 modes, so we're talking about less than a 1% increase in O(modes) operations and less than a 1.9% increase in O(modes^2) operations. Thanks, Richard > I've tried to think about other ways how to represent signed overflow check > in RTL, but didn't find any that would actually properly describe it and not > be way too complicated for the patterns. > > So, my preference is the first patch, but Richard S. doesn't like that much. > > Both patches have been bootstrapped/regtested on x86_64-linux and > i686-linux, ok for trunk (which one)? > > Jakub