I tried extra fwprop pass and got some very interesting results!

First "caveat" I just cut/pasted extra pass into list - nor worrying about detail.

     NEXT_PASS (pass_rtl_fwprop);
     NEXT_PASS (pass_local_alloc);

To show effects here is assembler code dump (which is easier to read than RTL)

(1)Just splitters - normal passes for O3 - no attempt to remove Or rx,0
(without splitters its almost 2x  bigger)

 23                   /* prologue: function */
 24                   /* frame size = 0 */
 25 0000 FC01              movw r30,r24
 26                   .LM2:
 27 0002 9181              ldd r25,Z+1
 28 0004 80E0              ldi r24,lo8(0)
 29                   .LVL1:
 30 0006 60E0              ldi r22,lo8(0)
 31 0008 2281              ldd r18,Z+2
 32 000a 30E0              ldi r19,lo8(0)
 33 000c 6060              ori r22,lo8(0)
 34 000e 762F              mov r23,r22
 35 0010 822B              or r24,r18
 36 0012 932B              or r25,r19
 37 0014 2481              ldd r18,Z+4
 38 0016 622B              or r22,r18
 39 0018 7060              ori r23,lo8(0)
 40 001a 8060              ori r24,lo8(0)
 41 001c 9060              ori r25,lo8(0)
 42 001e 2381              ldd r18,Z+3
 43 0020 40E0              ldi r20,lo8(0)
 44 0022 6060              ori r22,lo8(0)
 45 0024 722B              or r23,r18
 46 0026 832B              or r24,r19
 47 0028 942B              or r25,r20
 48                   /* epilogue start */
 49                   .LM3:
 50 002a 0895              ret

(2)Same code but now with fwprop:

 23                   /* prologue: function */
 24                   /* frame size = 0 */
 25 0000 FC01              movw r30,r24
 26                   .LM2:
 27 0002 4181              ldd r20,Z+1
 28 0004 942F              mov r25,r20
 29 0006 70E0              ldi r23,lo8(0)
 30 0008 3281              ldd r19,Z+2
 31 000a 832F              mov r24,r19
 32                   .LVL1:
 33 000c 9060              ori r25,lo8(0)
 34 000e 2481              ldd r18,Z+4
 35 0010 622F              mov r22,r18
 36 0012 8060              ori r24,lo8(0)
 37 0014 2381              ldd r18,Z+3
 38 0016 722B              or r23,r18
 39                   /* epilogue start */
 40                   .LM3:
 41 0018 0895              ret


Much better! But note we still have OR rx,0 created. (There were none before
 fwprop pass.) As there are still obvious propagation oppertunities I
suspect that these are being added by local-alloc propagation after imperfect fwprop.
 (4)Now with fwprop and NOP splitter for OR rx,0

  23                   /* prologue: function */
 24                   /* frame size = 0 */
 25 0000 FC01              movw r30,r24
 26                   .LM2:
 27 0002 4181              ldd r20,Z+1
 28 0004 942F              mov r25,r20
 29 0006 70E0              ldi r23,lo8(0)
 30 0008 3281              ldd r19,Z+2
 31 000a 832F              mov r24,r19
 32                   .LVL1:
 33 000c 2481              ldd r18,Z+4
 34 000e 622F              mov r22,r18
 35 0010 2381              ldd r18,Z+3
 36 0012 722B              or r23,r18
 37                   /* epilogue start */
 38                   .LM3:
 39 0014 0895              ret

No diference apart from OR Rx,0 removal. (I expected that)


(5) And just for the hell of it 2 passes of fwprop before local-alloc.
No NOP splitter.

     NEXT_PASS (pass_rtl_fwprop);
     NEXT_PASS (pass_rtl_fwprop);
     NEXT_PASS (pass_local_alloc);


 23                   /* prologue: function */
 24                   /* frame size = 0 */
 25 0000 FC01              movw r30,r24
 26                   .LM2:
 27 0002 9181              ldd r25,Z+1
 28 0004 8281              ldd r24,Z+2
 29                   .LVL1:
 30 0006 6481              ldd r22,Z+4
 31 0008 7381              ldd r23,Z+3
 32                   /* epilogue start */
 33                   .LM3:
 34 000a 0895              ret

Which is optimal. TADA!

This  would indicate that simplify-rtx inside fwprop is removing OR Rx,0
but not picking up the the additionally revealed forward propagation oppertunities
This would seem to be an avoidable limitation.

Andy


Reply via email to