Hi Dmitry,
Sorry about the late reply - been on vacation and catching up today on
email.
Here's why I'm asking. In the following example, dependence cost of 32
for cortex_a8_call causes insns 464 and 575 to be separated by 308 (in
spite having same priority), because 575 is not ready at tick 12, which
causes generation of separate IT-blocks for them on Thumb-2.
;;<----> 9--> 300 r0=call [`spec_putc'] :cortex_a8_issue_branch
;;<----> 9--> 306 r3=sl 0>>0x18^r8 :cortex_a8_default
;;<----> 10--> 309 cc=cmp(r5,r8) :cortex_a8_default
;;<----> 11--> 307 r3=[r3*0x4+r9] :cortex_a8_load_store_1
;;<----> 12--> 464 (cc) r2=0x1 :cortex_a8_default
;;<----> 13--> 308 sl=sl<<0x8^r3 :cortex_a8_default
;;<----> 41--> 575 (cc) [sp+0x4]=r2 :cortex_a8_load_store_1
Insn 575 has true dependency with call insn 300 on r2, which is
CALL_USED_REG, and as 464 is conditional, 575 retains true dependency
with 300.
Setting cortex_a8_call cost to 1 saves 186 bytes on SPEC2000 INT (but
I'm not sure whether it's only because of less IT-block splitting).
I doubt that the size savings you are seeing are just because of lesser
IT block splitting. If you measured the number of spills my suspicion is
that you'd be seeing fewer spills because of this change and any change
you see in IT block splitting is a consequence of that.
For the A8 this should be OK - try a few benchmarks to be sure it
doesn't spring any surprises performance wise.
cheers
Ramana
---
Ramana Radhakrishnan
PDSW Tools
ARM Ltd.