Dale Johannesen wrote:

Well, no, what is supposed to happen (I haven't tried it for a while, so I don't promise
this still works) is code like this:


.hotsection:
loop:
  conditional branch (i?==1000) to L2
L1:
  /* do stuff */
end loop:

/* still in hot section  */
L2:  jmp L3

.coldsection:
L3:
  i = 0;
  jmp L1


Well, even then, using of the cold section can increase the hot section size, depending on target, and for some
targets the maximum supported distance of the cold section.


For SH, using the cold section, you get (for non-PIC):

L2: mov.l 0f,rn
   jmp @rn
   nop
   .balign 4
0:  .long L3

   .coldsection:
L3: mov.l 0f,rn
   jmp @rn
   mov #0,rn
   .balign 4
0:  .long L1

I.e. 10 to 12 bytes each in in hot and cold sections.

Without the cold section, you need only 4 bytes:
L2: bra L1
   mov #0,rn

Note also, that in order to avoid the condjump-around-jump syndrome, L2 has
to be within about +-256 bytes of the condjump.
Should I do custom basic block reordering in machine_dependent_reorg to clean up
the turds of hot and cold partitioning?

Reply via email to