Re: [wide-int] Avoid some changes in output code

Richard Biener Mon, 04 Nov 2013 02:07:40 -0800

On Mon, 4 Nov 2013, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > On Mon, 4 Nov 2013, Richard Biener wrote:
> >
> >> On Fri, 1 Nov 2013, Richard Sandiford wrote:
> >> 
> >> > I'm building one target for each supported CPU and comparing the wide-int
> >> > assembly output of gcc.c-torture, gcc.dg and g++.dg with the 
> >> > corresponding
> >> > output from the merge point.  This patch removes all the differences I 
> >> > saw
> >> > for alpha-linux-gnu in gcc.c-torture.
> >> > 
> >> > Hunk 1: Preserve the current trunk behaviour in which the shift count
> >> > is truncated if SHIFT_COUNT_TRUNCATED and not otherwise.  This was by
> >> > inspection after hunk 5.
> >> > 
> >> > Hunks 2 and 3: Two cases where force_fit_to_type could extend to wider
> >> > types and where we weren't extending according to the sign of the source.
> >> > We should probably assert that the input is at least as wide as the 
> >> > type...
> >> > 
> >> > Hunk 4: The "&" in:
> >> > 
> >> >        if ((TREE_INT_CST_HIGH (arg1) & mask_hi) == mask_hi
> >> >            && (TREE_INT_CST_LOW (arg1) & mask_lo) == mask_lo)
> >> > 
> >> > had got dropped during the conversion.
> >> > 
> >> > Hunk 5: The old code was:
> >> > 
> >> >    if (shift > 0)
> >> >      {
> >> >        *mask = r1mask.llshift (shift, TYPE_PRECISION (type));
> >> >        *val = r1val.llshift (shift, TYPE_PRECISION (type));
> >> >      }
> >> >    else if (shift < 0)
> >> >      {
> >> >        shift = -shift;
> >> >        *mask = r1mask.rshift (shift, TYPE_PRECISION (type), !uns);
> >> >        *val = r1val.rshift (shift, TYPE_PRECISION (type), !uns);
> >> >      }
> >> > 
> >> > and these precision arguments had two purposes: to control which
> >> > bits of the first argument were shifted, and to control the truncation
> >> > mask for SHIFT_TRUNCATED.  We need to pass a width to the shift functions
> >> > for the second.
> >> > 
> >> > (BTW, I'm running the comparisons with CONST_WIDE_INT locally moved to 
> >> > the
> >> > end of the !GENERATOR_FILE list in rtl.def, since the current position 
> >> > caused
> >> > some spurious differences.  The "problem" AFAICT is that hash_rtx hashes
> >> > on code, RTL PRE creates registers in the hash order of the associated
> >> > expressions, RA uses register numbers as a tie-breaker during ordering,
> >> > and so the order of rtx_code can indirectly influence register 
> >> > allocation.
> >> > First time I'd realised that could happen, so just thought I'd mention 
> >> > it.
> >> > I think we should keep rtl.def in the current (logical) order though.)
> >> > 
> >> > Tested on x86_64-linux-gnu and powerpc64-linux-gnu.  OK for wide-int?
> >> 
> >> Bah - can you instead try removing the use of SHIFT_COUNT_TRUNCATED
> >> from double-int.c on the trunk?  If not then putting this at the
> >> callers of wi::rshift and friends is clearly asking for future
> >> mistakes.
> >> 
> >> Oh, and I think that honoring SHIFT_COUNT_TRUNCATED anywhere else
> >> than in machine instruction patterns was a mistake in the past.
> >
> > Oh, and my understanding is that it's maybe required for correctness on
> > the RTL side if middle-end code removes masking operations.
> > Constant propagation results later may not change the result because
> > of that.  I'm not sure this is the way it happens, but if we have
> >
> >   (lshift:si (reg:si 1) (and:qi (reg:qi 2) 0x1f))
> >
> > and we'd transform it to (based on SHIFT_COUNT_TRUNCATED)
> >
> >   (lshift:si (reg:si 1) (reg:qi 2))
> >
> > and later constant propagate 77 to reg:qi 2.
> >
> > That isn't the way SHIFT_COUNT_TRUNCATED should work, of course,
> > but instead the backends should have a define_insn which matches
> > the 'and' and combine should try to match that.
> >
> > You can see that various architectures may have truncating shifts
> > but not for all patterns which results in stuff like
> >
> > config/aarch64/aarch64.h:#define SHIFT_COUNT_TRUNCATED !TARGET_SIMD
> >
> > which clearly means that it's not implemented in terms of providing
> > shift insns which can match a shift operand that is masked.
> >
> > That is, either try to clean that up (hooray!), or preserve
> > the double-int.[ch] behavior (how does CONST_INT RTL constant
> > folding honor it?).
> 
> The CONST_INT code does the modulus explicitly:
> 
>         /* Truncate the shift if SHIFT_COUNT_TRUNCATED, otherwise make sure
>            the value is in range.  We can't return any old value for
>            out-of-range arguments because either the middle-end (via
>            shift_truncation_mask) or the back-end might be relying on
>            target-specific knowledge.  Nor can we rely on
>            shift_truncation_mask, since the shift might not be part of an
>            ashlM3, lshrM3 or ashrM3 instruction.  */
>         if (SHIFT_COUNT_TRUNCATED)
>           arg1 = (unsigned HOST_WIDE_INT) arg1 % width;
>         else if (arg1 < 0 || arg1 >= GET_MODE_BITSIZE (mode))
>           return 0;
> 
> There was a strong feeling (from me and others) that wide-int.h
> should not depend on tm.h.  I don't think wi::lshift itself should
> know about SHIFT_COUNT_TRUNCATED.  Especially since (and I think we agree
> on this :-)) it's a terrible interface.  It would be better if it took
> a mode as an argument and (perhaps) returned a mask as a result.  But that
> would make it even harder for wide-int.h to do the right thing, because
> it doesn't know about modes.  Having the callers do it seems like the
> right thing to me, both now and after any future tweaks to
> SHIFT_COUNT_TRUNCATED.


Hmm, yeah.  Oh well.

> If your objection is that it's too easy to forget, then I don't think
> that's a problem at the RTL level.  simplify_const_binary_operation
> is the only place that does constant RTX shifts.
> 
> Sorry, I don't have time to replace SHIFT_COUNT_TRUNCATED right now.

Personally I'd be happy with a global #define SHIFT_COUNT_TRUNCATED 0
(thus basically do a source code transform based on that).  Any
target we regress on is left to target maintainers to decide if and
how to fix.  Of course picking a single popular one and restoring
behavior there would be nice (you may pick AARCH64 ...).  Most
targets have weird comments like

/* Immediate shift counts are truncated by the output routines (or was it
   the assembler?).  Shift counts in a register are truncated by ARM.  Note
   that the native compiler puts too large (> 32) immediate shift counts
   into a register and shifts by the register, letting the ARM decide what
   to do instead of doing that itself.  */
/* This is all wrong.  Defining SHIFT_COUNT_TRUNCATED tells combine that
   code like (X << (Y % 32)) for register X, Y is equivalent to (X << Y).
   On the arm, Y in a register is used modulo 256 for the shift. Only for
   rotates is modulo 32 used.  */
/* #define SHIFT_COUNT_TRUNCATED 1 */

and thus they end up erroring on the safe side and using 0 for
SHIFT_COUNT_TRUNCATED anyway...

A quick grep shows that targets with SHIFT_COUNT_TRUNCATED 1 are
aarch64 (for !TARGET_SIMD), alpha, epiphany, iq2000,
lm32, m32r, mep, microblaze, mips (for !TARGET_LOONGSON_VECTORS),
mn10300, nds32, pa, picochip, score, sparc, stormy16, tilepro,
v850 and xtensa.  I'm sure one or the other uses it with wrong
assumptions about how it works.

AFAICS SHIFT_COUNT_TRUNCATED provides the opportunity to
remove certain explicit mask operations for shift operands.

Thanks,
Richard.

Re: [wide-int] Avoid some changes in output code

Reply via email to