Before seeing Alex's suggestion, I didn't realize that we could use the existing injection mechanism to enable plugins to count instructions accurately.

Now, I also agree that the new API introduced by this patch for the plugin subsystem could make the plugin API overly complex.


Many thanks to Alex and Pierrick for the timely review and helpful comments.



On 9/3/24 2:56 AM, Pierrick Bouvier wrote:
On 9/2/24 10:52, Alex Bennée wrote:
Pierrick Bouvier <pierrick.bouv...@linaro.org> writes:

Hi Xingran,

On 9/2/24 03:42, Alex Bennée wrote:
Xingran Wang <wangxingran123...@outlook.com> writes:

Currently, the instruction count obtained by plugins using the translation block execution callback is larger than the actual value. Adding callbacks in cpu_restore_state_from_tb() and cpu_io_recompile() allows plugins to
correct the instruction count when exiting a translation block
mid-execution, properly subtracting the excess unexecuted
instructions.
This smells like exposing two much of the TCG internals to the
plugin
mechanism. You can already detect when we don't reach the end of a block
of instructions by instrumentation as I did in:


I agree that this is definitely a QEMU implementation "detail", and
should not be a concern for end users.

The documentation already warns that all instructions may not execute,
and that in this case, it's better to instrument them directly,
instead of TB:
https://www.qemu.org/docs/master/devel/tcg-plugins.html#translation-blocks.

Finally, even if we integrated an API like what you propose in this
patch, I think it would be very easy for plugins writers to make a
mistake using it, as you need to keep track of everything yourself.

If we want to stay on the topic of this patch, one direction that
would be better in my opinion is a "after_tb_exec" API, where the TB
passed in parameter is guaranteed to have all its instructions that
were executed (not interrupted).

Or indeed resolves with the current PC at the "end" of the TB where it
gets to. QEMU could keep track of that easily enough as the recompile
and bus fault paths are slow paths anyway. It would be tricky to support
inline for that though.

As TB is exposed internally I think we'd just need to set a flag and
call out. Maybe an API like:

   /**
    * typedef qemu_plugin_vcpu_tb_end_cb_t - vcpu callback at end of block
    * @vcpu_index: the current vcpu context
    * @pc: the next PC
    * @insns: instructions executed in block
    * @userdata: a pointer to some user data supplied when the callback
    * was registered.
    */
   typedef void (*qemu_plugin_vcpu_tb_end_cb_t)(unsigned int vcpu_index,
                                                uint64_t next_pc,
                                                size_t n_insns,
                                                void *userdata);

   /**
    * qemu_plugin_register_vcpu_tb_exec_end_cb() - register execution callback at end of TB
    * @tb: the opaque qemu_plugin_tb handle for the translation
    * @cb: callback function
    * @flags: does the plugin read or write the CPU's registers?
    * @userdata: any plugin data to pass to the @cb?
    *
    * The @cb function is called every time a translated unit executes.
    */
   QEMU_PLUGIN_API
   void qemu_plugin_register_vcpu_tb_exec_end_cb(struct qemu_plugin_tb *tb,
qemu_plugin_vcpu_tb_end_cb_t cb,
                                                  enum qemu_plugin_cb_flags flags,
                                                  void *userdata);


Something like this, yes.
I still think it makes the whole API too complex, and would confuse users. If plugins writers need "instruction accurate" instrumentation, there are already functions for that. And if the only use case is to identify control flow changes, then we could create a dedicated API for this.


The API that Alex proposed looks great and could replace the two event callbacks introduced in this patch. I also agree with Pierrick that the existing instrumentation mechanism already allows users to achieve instruction-accurate instrumentation. It would be even better if this API didn't rely on inserting TCG instructions before every instruction in the TB, as this would offer better performance compared to the original implementation for identifying control flow changes.

I wonder what is the original use case of Xingran. Any chance you could share with us why this is needed, and why existing functions are not enough?

Summary:

In retrospect, these two event callbacks don't serve a general purpose and can be replaced by the existing injection mechanism, which indeed makes the API confusing and complex.

Detail:


We (myself and the XiangShan team) attempted to implement the "SimPoint" method using QEMU's plugin system and to accelerate RISCV CPU core performance simulation with "SimPoint Checkpoint" and QEMU.

The first step of "SimPoint" is to profile the frequency of executed code to create code signatures representing the program's behavior at different execution points. This requires an accurate instruction count, which can be instrumented by the plugin.

I used a simple bare-metal program to verify that the instruction count instrumented by the plugin matched the actual value. However, I discovered that TCG would prematurely exit the currently executing TB when encountering an MMIO instruction or an exception without notifying the count plugin, resulting in the instruction count higher than the actual value. As a QEMU novice, a straightforward solution for me was to have TCG notify the plugin whenever a TB is exited prematurely during execution.This is the role of the event callback in cpu_restore_state_from_tb(). The event callback in cpu_io_recompile(), on the other hand, was added as a workaround because the recompiled TB only allows memory instrumentation, making the instruction count for this short TB undetectable by the plugin.


I think the tricky bit would be getting TCG to emit the callback code
for the last instruction before the
tcg_gen_exit_tb/tcg_gen_lookup_and_goto_ptr bits but after whatever else
it has done to execute the instruction.

I don't think we could easily support inline ops at tb end though.

Richard,

What do you think?


    Message-Id: <20240718145958.1315270-1-alex.ben...@linaro.org>
    Date: Thu, 18 Jul 2024 15:59:58 +0100
    Subject: [RFC PATCH v3] contrib/plugins: control flow plugin
    From: =?UTF-8?q?Alex=20Benn=C3=A9e?= <alex.ben...@linaro.org>
So what exactly are we trying to achieve here? A more efficient
detection of short blocks?


Signed-off-by: Xingran Wang <wangxingran123...@outlook.com>
---
   accel/tcg/translate-all.c    |  27 ++++++++
   include/qemu/plugin-event.h  |   2 +
   include/qemu/plugin.h        |  24 +++++++
   include/qemu/qemu-plugin.h   | 131 +++++++++++++++++++++++++++++++++++
   plugins/api.c                |  78 +++++++++++++++++++++
   plugins/core.c               |  42 +++++++++++
   plugins/qemu-plugins.symbols |  10 +++
   tests/tcg/plugins/bb.c       |  25 +++++++
   8 files changed, 339 insertions(+)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index fdf6d8ac19..642f684372 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -65,6 +65,7 @@
   #include "internal-target.h"
   #include "tcg/perf.h"
   #include "tcg/insn-start-words.h"
+#include "qemu/plugin.h"
     TBContext tb_ctx;
   @@ -218,6 +219,19 @@ void cpu_restore_state_from_tb(CPUState
*cpu, TranslationBlock *tb,
           cpu->neg.icount_decr.u16.low += insns_left;
       }
   +#ifdef CONFIG_PLUGIN
+    /*
+     * Notify the plugin with the relevant information
+     * when restoring the execution state of a TB.
+     */
+    struct qemu_plugin_tb_restore ptb_restore;
+    ptb_restore.cpu_index = cpu->cpu_index;
+    ptb_restore.insns_left = insns_left;
+    ptb_restore.tb_n = tb->icount;
+    ptb_restore.tb_pc = tb->pc;
+    qemu_plugin_tb_restore_cb(cpu, &ptb_restore);
+#endif
+
See also the unwind patches which is a more generic approach to
ensuring
"special" registers are synced at midpoint when using the register API:
    Message-Id: <20240606032926.83599-1-richard.hender...@linaro.org>
    Date: Wed,  5 Jun 2024 20:29:17 -0700
    Subject: [PATCH v2 0/9] plugins: Use unwind info for special gdb registers
    From: Richard Henderson <richard.hender...@linaro.org>
<snip>


Thanks,
Pierrick



Reply via email to