On 08/17/2016 11:41 AM, Richard Henderson wrote:
On 08/17/2016 10:58 AM, Emilio G. Cota wrote:
(2) that we should start a new TB upon encountering a load-exclusive, so
that we maximize the chance of the store-exclusive being a part of the same
TB and thus have *nothing* extra between the beginning and commit of the
transaction.
I don't know how to do this. If it's easy to do, please let me know how
(for aarch64 at least, since that's the target I'm using).
It's a simple matter of peeking at the next instruction.
One way is to partially decode the insn before advancing the PC.
static void disas_a64_insn (CPUARMState *env, DisasContext *s, int num_insns)
{
uint32_t insn = arm_ldl_code(env, s->pc, s->sctlr_b);
+
+ if (num_insns > 1 && (insn & xxx) == yyy) {
+ /* Start load-exclusive in a new TB. */
+ s->is_jmp = DISAS_UPDATE;
+ return;
+ }
s->insn = insn;
s->pc += 4;
...
Alternately, store num_insns into DisasContext, and do pc -= 4 in
disas_ldst_excl.
Actually, the mask check is the only really viable solution, and it needs to
happen before we do the tcg_gen_insn_start thing.
A couple of other notes, as I've thought about this some more.
If the start and end of the transaction are not in the same TB, the likelihood
of transaction failure should be very near 100%. Consider:
* TB with ldrex ends before the strex.
* Since the next TB hasn't been built yet, we'll definitely go
through tb_find_physical, through the translator, and through
the tcg compiler.
(a) Which I think we can definitely assume will exhaust any
resources associated with the transaction.
(b) Which will abort the transaction,
(c) Which, with the current code, will retry N times, with
identical results, failing within the compiler each time,
(d) Which, with the current code, will single-step through
to the strex, as you saw.
* Since we proceed to (d) the first time, we'll never succeed
to create the next TB, so we'll always iterate compilation N
times, resulting in the single-step.
This is probably the real slow-down that you see.
Therefore, we must abort any transaction when we exit tcg-generated code. Both
through cpu_exit_loop or through the tcg epilogue. We should be able to use
the software controlled bits associated with the abort to tell what kind of
event lead to the abort. However, we must bear in mind that (for both x86 and
ppc at least) we only have an 8-bit abort code. So we can't pass back a
pointer, for instance.
We should think about what kinds of limitations we should accept for handling
ll/sc via transactions.
* How do we handle unpaired ldrexd / ldxp? This is used by the compiler,
as it's the only way to perform a double-word atomic load.
This implies that we need some sort of counter, beyond which we stop
trying to succeed via transaction.
* In order to make normal cmpxchg patterns work, we have to be able to
handle a branch within a ll/sc sequence. Options:
* Less complex way is to build a TB, including branches, with a max
of N insns along the branch-not-taken path, searching for the strex.
But of course this fails to handle legitimate patterns for arm
(and other ll/sc guests).
However, gcc code generation will generally annotate the cmpxchg
failure branch as not-taken, so perhaps this will work well enough
in practice.
* More complex way is to build a TB, including branches, with a max
of N insns along *all* paths, searching for the strex. This runs
into problems with, among other things, branches crossing pages.
* Most complex way is to somehow get all of the TBs built, and
linked together, preferably before we even try executing
(and failing the transaction in) the first TB.
r~