On Thu, Jun 03, 2021 at 02:49:15PM +0800, Xionghu Luo wrote: > If remove the rotate in simplify-rtx like below: > > +++ b/gcc/simplify-rtx.c > @@ -3830,10 +3830,16 @@ simplify_context::simplify_binary_operation_1 > (rtx_code code, > case ROTATE: > if (trueop1 == CONST0_RTX (mode)) > return op0; > + > + if (GET_CODE (trueop0) == ROTATE && trueop1 == GEN_INT (64) > + && CONST_INT_P (XEXP (trueop0, 1)) > + && INTVAL (XEXP (trueop0, 1)) == 64) > + return XEXP (trueop0, 0);
(The hardcoded 64 need improving -- but this is just a proof of concept I'll assume :-) ) > Combine still fail to merge the two instructions: > > Trying 6 -> 7: > 6: r120:KF#0=r125:KF#0<-<0x40 > REG_DEAD r125:KF > 7: [sfp:DI+r123:DI]=r120:KF#0<-<0x40 > REG_DEAD r120:KF > Successfully matched this instruction: > (set (mem/c:V1TI (plus:DI (reg/f:DI 110 sfp) > (reg:DI 123)) [1 S16 A128]) > (subreg:V1TI (reg:KF 125) 0)) > rejecting combination of insns 6 and 7 > original costs 4 + 4 = 8 > replacement cost 12 So what instructions were these? Why did the store cost 4 but the new one costs 12? > By hacking the vsx_le_perm_store_v1ti INSN_COST from 12 to 8, It should be the same cost as the other store! > it could merge the instructions: > > 21: r125:KF=%v2:KF > REG_DEAD %v2:KF > 2: NOTE_INSN_DELETED > 3: NOTE_INSN_FUNCTION_BEG > 6: NOTE_INSN_DELETED > 17: r123:DI=0x20 > 7: [sfp:DI+r123:DI]=r125:KF#0 > REG_DEAD r125:KF > 19: NOTE_INSN_DELETED > 14: %v2:V1TI=[sfp:DI+r123:DI] > REG_DEAD r123:DI > 15: use %v2:V1TI > > Then followed split1 pass will still split it to due to no dse pass > between to remove the memory operations on stack, remove the rotate > in swap won't face such problem since it runs before dse and no split > pass between them: Sure, but none of that is the point. I asked if we did this for TImode properly, and maybe we do, but: > 22: r126:V1TI=r125:KF#0<-<0x40 > 23: [sfp:DI+r123:DI]=r126:V1TI<-<0x40 ... this is V1TI mode. Segher