On 9/8/22 18:44, Richard Henderson wrote:
On 9/8/22 22:18, Leandro Lupori wrote:
PowerPC64 processors handle direct branches better than indirect
ones, resulting in less stalled cycles and branch misses.
However, PPC's tb_target_set_jmp_target() was only using direct
branches for 16-bit jumps, while PowerPC64's unconditional branch
instructions are able to handle displacements of up to 26 bits.
To take advantage of this, now jumps whose displacements fit in
between 17 and 26 bits are also converted to direct branches.
This doesn't work because you have to be able to unset the jump as well,
and your two step
sequence doesn't handle that. (You wind up with the two insn address
load reset, but the
jump continuing to the previous target -- boom.)
Hello Richard, thanks for your review!
Right, I hadn't noticed this issue.
For v2.07+, you could use stq to update 4 insns atomically.
I'll try this alternative in v2, so that more CPUs can benefit from this
change.
For v3.1+, you can eliminate TCG_REG_TB, using prefixed pc-relative
addressing instead.
Which brings you back to only needing to update 8 bytes atomically
(select either paddi to
compute address to feed to following mtctr+bcctr, or direct branch + nop
leaving the
mtctr+bcctr alone and unreachable).
(Actually, there are lots of updates one could make to tcg/ppc for v3.1...)
r~