On Mon, 4 Nov 2013, Richard Sandiford wrote: > Richard Biener <rguent...@suse.de> writes: > > On Mon, 4 Nov 2013, Richard Biener wrote: > > > >> On Fri, 1 Nov 2013, Richard Sandiford wrote: > >> > >> > I'm building one target for each supported CPU and comparing the wide-int > >> > assembly output of gcc.c-torture, gcc.dg and g++.dg with the > >> > corresponding > >> > output from the merge point. This patch removes all the differences I > >> > saw > >> > for alpha-linux-gnu in gcc.c-torture. > >> > > >> > Hunk 1: Preserve the current trunk behaviour in which the shift count > >> > is truncated if SHIFT_COUNT_TRUNCATED and not otherwise. This was by > >> > inspection after hunk 5. > >> > > >> > Hunks 2 and 3: Two cases where force_fit_to_type could extend to wider > >> > types and where we weren't extending according to the sign of the source. > >> > We should probably assert that the input is at least as wide as the > >> > type... > >> > > >> > Hunk 4: The "&" in: > >> > > >> > if ((TREE_INT_CST_HIGH (arg1) & mask_hi) == mask_hi > >> > && (TREE_INT_CST_LOW (arg1) & mask_lo) == mask_lo) > >> > > >> > had got dropped during the conversion. > >> > > >> > Hunk 5: The old code was: > >> > > >> > if (shift > 0) > >> > { > >> > *mask = r1mask.llshift (shift, TYPE_PRECISION (type)); > >> > *val = r1val.llshift (shift, TYPE_PRECISION (type)); > >> > } > >> > else if (shift < 0) > >> > { > >> > shift = -shift; > >> > *mask = r1mask.rshift (shift, TYPE_PRECISION (type), !uns); > >> > *val = r1val.rshift (shift, TYPE_PRECISION (type), !uns); > >> > } > >> > > >> > and these precision arguments had two purposes: to control which > >> > bits of the first argument were shifted, and to control the truncation > >> > mask for SHIFT_TRUNCATED. We need to pass a width to the shift functions > >> > for the second. > >> > > >> > (BTW, I'm running the comparisons with CONST_WIDE_INT locally moved to > >> > the > >> > end of the !GENERATOR_FILE list in rtl.def, since the current position > >> > caused > >> > some spurious differences. The "problem" AFAICT is that hash_rtx hashes > >> > on code, RTL PRE creates registers in the hash order of the associated > >> > expressions, RA uses register numbers as a tie-breaker during ordering, > >> > and so the order of rtx_code can indirectly influence register > >> > allocation. > >> > First time I'd realised that could happen, so just thought I'd mention > >> > it. > >> > I think we should keep rtl.def in the current (logical) order though.) > >> > > >> > Tested on x86_64-linux-gnu and powerpc64-linux-gnu. OK for wide-int? > >> > >> Bah - can you instead try removing the use of SHIFT_COUNT_TRUNCATED > >> from double-int.c on the trunk? If not then putting this at the > >> callers of wi::rshift and friends is clearly asking for future > >> mistakes. > >> > >> Oh, and I think that honoring SHIFT_COUNT_TRUNCATED anywhere else > >> than in machine instruction patterns was a mistake in the past. > > > > Oh, and my understanding is that it's maybe required for correctness on > > the RTL side if middle-end code removes masking operations. > > Constant propagation results later may not change the result because > > of that. I'm not sure this is the way it happens, but if we have > > > > (lshift:si (reg:si 1) (and:qi (reg:qi 2) 0x1f)) > > > > and we'd transform it to (based on SHIFT_COUNT_TRUNCATED) > > > > (lshift:si (reg:si 1) (reg:qi 2)) > > > > and later constant propagate 77 to reg:qi 2. > > > > That isn't the way SHIFT_COUNT_TRUNCATED should work, of course, > > but instead the backends should have a define_insn which matches > > the 'and' and combine should try to match that. > > > > You can see that various architectures may have truncating shifts > > but not for all patterns which results in stuff like > > > > config/aarch64/aarch64.h:#define SHIFT_COUNT_TRUNCATED !TARGET_SIMD > > > > which clearly means that it's not implemented in terms of providing > > shift insns which can match a shift operand that is masked. > > > > That is, either try to clean that up (hooray!), or preserve > > the double-int.[ch] behavior (how does CONST_INT RTL constant > > folding honor it?). > > The CONST_INT code does the modulus explicitly: > > /* Truncate the shift if SHIFT_COUNT_TRUNCATED, otherwise make sure > the value is in range. We can't return any old value for > out-of-range arguments because either the middle-end (via > shift_truncation_mask) or the back-end might be relying on > target-specific knowledge. Nor can we rely on > shift_truncation_mask, since the shift might not be part of an > ashlM3, lshrM3 or ashrM3 instruction. */ > if (SHIFT_COUNT_TRUNCATED) > arg1 = (unsigned HOST_WIDE_INT) arg1 % width; > else if (arg1 < 0 || arg1 >= GET_MODE_BITSIZE (mode)) > return 0; > > There was a strong feeling (from me and others) that wide-int.h > should not depend on tm.h. I don't think wi::lshift itself should > know about SHIFT_COUNT_TRUNCATED. Especially since (and I think we agree > on this :-)) it's a terrible interface. It would be better if it took > a mode as an argument and (perhaps) returned a mask as a result. But that > would make it even harder for wide-int.h to do the right thing, because > it doesn't know about modes. Having the callers do it seems like the > right thing to me, both now and after any future tweaks to > SHIFT_COUNT_TRUNCATED.
Hmm, yeah. Oh well. > If your objection is that it's too easy to forget, then I don't think > that's a problem at the RTL level. simplify_const_binary_operation > is the only place that does constant RTX shifts. > > Sorry, I don't have time to replace SHIFT_COUNT_TRUNCATED right now. Personally I'd be happy with a global #define SHIFT_COUNT_TRUNCATED 0 (thus basically do a source code transform based on that). Any target we regress on is left to target maintainers to decide if and how to fix. Of course picking a single popular one and restoring behavior there would be nice (you may pick AARCH64 ...). Most targets have weird comments like /* Immediate shift counts are truncated by the output routines (or was it the assembler?). Shift counts in a register are truncated by ARM. Note that the native compiler puts too large (> 32) immediate shift counts into a register and shifts by the register, letting the ARM decide what to do instead of doing that itself. */ /* This is all wrong. Defining SHIFT_COUNT_TRUNCATED tells combine that code like (X << (Y % 32)) for register X, Y is equivalent to (X << Y). On the arm, Y in a register is used modulo 256 for the shift. Only for rotates is modulo 32 used. */ /* #define SHIFT_COUNT_TRUNCATED 1 */ and thus they end up erroring on the safe side and using 0 for SHIFT_COUNT_TRUNCATED anyway... A quick grep shows that targets with SHIFT_COUNT_TRUNCATED 1 are aarch64 (for !TARGET_SIMD), alpha, epiphany, iq2000, lm32, m32r, mep, microblaze, mips (for !TARGET_LOONGSON_VECTORS), mn10300, nds32, pa, picochip, score, sparc, stormy16, tilepro, v850 and xtensa. I'm sure one or the other uses it with wrong assumptions about how it works. AFAICS SHIFT_COUNT_TRUNCATED provides the opportunity to remove certain explicit mask operations for shift operands. Thanks, Richard.