On Fri, May 29, 2015 at 12:41:20PM +0930, Alan Modra wrote:
> I'll tell you one of the reasons why they are
> slower, as any decent hardware engineer could probably figure this out
> themselves anyway.  The record form instructions are cracked into two
> internal ops, the basic arithmetic/logic op, and a compare.  There's a
> limit to how much hardware can do in one clock cycle, or conversely,
> if you try to do more your clock must be slower.

Logical and simple arithmetic record-form ops aren't cracked, according
to our pipeline descriptions, and some simple testing (which could well
be flawed :-) ).

Of course I agree that cmp is better than and., but it will execute
pretty much the same in most code as far as I can tell.

> > > one of the aims of the wider patch I was working
> > > on was to remove patterns like rotlsi3_64, ashlsi3_64, lshrsi3_64 and
> > > ashrsi3_64.
> > 
> > We will need such patterns no matter what; the compiler cannot magically
> > know what machine insns set the high bits of a 64-bit reg to zero.
> 
> No, not by magic.  I define EXTEND_OP in rs6000.h and use it in
> record_value_for_reg.  Full patch follows.  I see enough code gen
> improvements on powerpc64le to make this patch worth pursuing,
> things like "rlwinm 0,5,6,0,25; extsw 0,0" being converted to
> "rldic 0,5,6,52".  No doubt due to being able to prove an int var
> doesn't have the sign bit set.  Hmm, in fact the 52 says it is
> known to be only 6 bits before shifting.

Ah, interesting.  So you let reg_stat know about the full register
result in cases where the RTL instruction does not mention the full
register at all.  That sounds like a worthwhile direction to explore :-)

> +/* Describe how rtl operations on registers behave on this target when
> +   operating on less than the entire register.  */
> +#define EXTEND_OP(OP) \
> +  (GET_MODE (OP) != SImode           \
> +   || !TARGET_POWERPC64                      \
> +   ? UNKNOWN                         \
> +   : (GET_CODE (OP) == AND           \
> +      || GET_CODE (OP) == ZERO_EXTEND        \
> +      || GET_CODE (OP) == ASHIFT     \
> +      || GET_CODE (OP) == ROTATE     \
> +      || GET_CODE (OP) == LSHIFTRT)  \
> +   ? ZERO_EXTEND                     \
> +   : (GET_CODE (OP) == SIGN_EXTEND   \
> +      || GET_CODE (OP) == ASHIFTRT)  \
> +   ? SIGN_EXTEND                     \
> +   : UNKNOWN)

I think this is too simplistic though.  For example, AND with -7 is not
zero-extended (rlwinm rD,rA,0,31,28 sets the high 32 bits of rD to the low
32 bits of rA).

In general, everything depends on what exact machine insn is used; basing
the decision on the RTL leads to duplication, is fragile, _will_ get out
of synch.


Segher

Reply via email to