On Fri, May 29, 2015 at 12:41:20PM +0930, Alan Modra wrote: > I'll tell you one of the reasons why they are > slower, as any decent hardware engineer could probably figure this out > themselves anyway. The record form instructions are cracked into two > internal ops, the basic arithmetic/logic op, and a compare. There's a > limit to how much hardware can do in one clock cycle, or conversely, > if you try to do more your clock must be slower.
Logical and simple arithmetic record-form ops aren't cracked, according to our pipeline descriptions, and some simple testing (which could well be flawed :-) ). Of course I agree that cmp is better than and., but it will execute pretty much the same in most code as far as I can tell. > > > one of the aims of the wider patch I was working > > > on was to remove patterns like rotlsi3_64, ashlsi3_64, lshrsi3_64 and > > > ashrsi3_64. > > > > We will need such patterns no matter what; the compiler cannot magically > > know what machine insns set the high bits of a 64-bit reg to zero. > > No, not by magic. I define EXTEND_OP in rs6000.h and use it in > record_value_for_reg. Full patch follows. I see enough code gen > improvements on powerpc64le to make this patch worth pursuing, > things like "rlwinm 0,5,6,0,25; extsw 0,0" being converted to > "rldic 0,5,6,52". No doubt due to being able to prove an int var > doesn't have the sign bit set. Hmm, in fact the 52 says it is > known to be only 6 bits before shifting. Ah, interesting. So you let reg_stat know about the full register result in cases where the RTL instruction does not mention the full register at all. That sounds like a worthwhile direction to explore :-) > +/* Describe how rtl operations on registers behave on this target when > + operating on less than the entire register. */ > +#define EXTEND_OP(OP) \ > + (GET_MODE (OP) != SImode \ > + || !TARGET_POWERPC64 \ > + ? UNKNOWN \ > + : (GET_CODE (OP) == AND \ > + || GET_CODE (OP) == ZERO_EXTEND \ > + || GET_CODE (OP) == ASHIFT \ > + || GET_CODE (OP) == ROTATE \ > + || GET_CODE (OP) == LSHIFTRT) \ > + ? ZERO_EXTEND \ > + : (GET_CODE (OP) == SIGN_EXTEND \ > + || GET_CODE (OP) == ASHIFTRT) \ > + ? SIGN_EXTEND \ > + : UNKNOWN) I think this is too simplistic though. For example, AND with -7 is not zero-extended (rlwinm rD,rA,0,31,28 sets the high 32 bits of rD to the low 32 bits of rA). In general, everything depends on what exact machine insn is used; basing the decision on the RTL leads to duplication, is fragile, _will_ get out of synch. Segher