Emilio G. Cota <c...@braap.org> writes:
> On Mon, Nov 26, 2018 at 15:16:00 +0000, Alex Bennée wrote: >> Emilio G. Cota <c...@braap.org> writes: > (snip) >> > + if (tb_trans_cb && first_pass) { >> > + qemu_plugin_tb_trans_cb(cpu, plugin_tb); >> > + first_pass = false; >> > + goto translate; >> > + } >> >> So the only reason we are doing this two pass tango is to ensure the >> plugin can insert TCG ops before the actual translation has occurred? > > Not only. The idea is to provide plugins with well-defined TBs, > i.e. the instruction sizes and contents can be queried by the plugin > before the plugin decides how/where to instrument the TB. Hmmm, this seems a little to close to internal knowledge of the TCG. Is the idea that a plugin might make a different decision based on the number of a particular type of instruction in the translation block? This seems like it would get broken if we wanted to implement other types of TranslationBlock (e.g. hot-blocks with multiple exits for the non-hot case). That said looking at the examples using it so far it doesn't seem to be doing more than looking at the total number of instructions for a given translation block. > Since in the targets we generate TCG code and also generate > host code in a single shot (TranslatorOps.translate_insn), > the 2-pass approach is a workaround to first get the > well-defined TB, and in the second pass inject the instrumentation > in the appropriate places. Richard's suggestion of providing a central translator_ldl_code could keep the book keeping of each instruction location and contents in the core translator. With a little tweaking to the TCG we could then insert our instrumentation at the end of the pass with all the knowledge we want to export to the plugin. Inserting instrumentation after instructions have executed will be trickier though due to reasons Peter mentioned on IRC. > This is a bit of a waste but given that it only happens at > translate time, it can have negligible performance impact -- > I measured a 0.13% gmean slowdown for SPEC06int. I'm less concerned about efficiency as complicating the code, especially if we are baking in concepts that restrict our freedom to change code generation around going forward. > >> I think we can do better, especially as the internal structures of >> TCGops are implemented as a list so ops and be inserted before and after >> other ops. This is currently only done by the optimiser at the moment, >> see: >> >> TCGOp *tcg_op_insert_before(TCGContext *s, TCGOp *op, TCGOpcode opc, int >> narg); >> TCGOp *tcg_op_insert_after(TCGContext *s, TCGOp *op, TCGOpcode opc, int >> narg); >> >> and all the base tcg ops end up going to tcg_emit_op which just appends >> to the tail. But if we can come up with a neater way to track the op >> used before the current translated expression we could do away with two >> phases translation completely. > > This list of ops is generated via TranslatorOps.translate_insn. > Unfortunately, this function also defines the guest insns that form the TB. > Decoupling the two actions ("define the TB" and "translate to TCG ops") > would be ideal, but I didn't want to rewrite all the target translators > in QEMU, and opted instead for the 2-pass approach as a compromise. I don't quite follow. When we've done all our translation into TCG ops haven't we by definition defined the TB? Maybe the interface shouldn't be per-insn and per-TB but just an arbitrary chunk of instructions. We could call the plugin with a list of instructions with some opaque data that can be passed back to the plugin APIs to allow insertion of instrumentation at the appropriate points. The initial implementation would be a single-pass and called after the TCG op generation. An instruction counter plugin would then be free to insert counter instrumentation as frequently or infrequently as it wants. These chunks wouldn't have to be tied to the internals of TCG and in the worst case we could just inform the plugin in 1 insn chunks without having to change the API? What do you think? -- Alex Bennée