pmu_book3s_helper.c: enable PMC1 counter negative EBB

Daniel Henrique Barboza Sat, 14 Aug 2021 12:13:45 -0700



On 8/12/21 9:35 PM, Richard Henderson wrote:

On 8/12/21 11:24 AM, Daniel Henrique Barboza wrote:

+void helper_insns_inc(CPUPPCState *env)
+{
+    env->pmu_insns_count++;
+}
+
+void helper_insns_dec(CPUPPCState *env)
+{
+    env->pmu_insns_count--;
+}
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 60f35360b7..c56c656694 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -8689,6 +8689,7 @@ static void ppc_tr_tb_start(DisasContextBase *db, 
CPUState *cs)

  static void ppc_tr_insn_start(DisasContextBase *dcbase, CPUState *cs)
  {
+    gen_helper_insns_inc(cpu_env);
      tcg_gen_insn_start(dcbase->pc_next);
  }

@@ -8755,6 +8756,8 @@ static void ppc_tr_tb_stop(DisasContextBase *dcbase, 
CPUState *cs)
          return;
      }

+    gen_helper_insns_dec(cpu_env);
+
      /* Honor single stepping. */
      sse = ctx->singlestep_enabled & (CPU_SINGLE_STEP | GDBSTUB_SINGLE_STEP);
      if (unlikely(sse)) {


And then I used 'env->pmu_insns_count' to update the instruction counters. And 
it's
working, with a slightly better performance than we have with the icount()
version. I'm not sure why it's working now since I tried something very similar
before and it didn't. Figures.


Not sure why you're decrementing there.

Also, how do you want to count retired instructions? Ones that merely start?  
E.g. including loads that fault?  How about illegal instructions?  Or does the 
instruction need to run to completion?  Which of these you choose affects where 
you place the increment.


Before I proceed, let me mention that the numbers I'm about to post here are 
from the powerpc
selftest located here:

https://github.com/torvalds/linux/blob/master/tools/testing/selftests/powerpc/pmu/count_instructions.c

This test runs an instruction loop that consists of 'addi' instructions . 
Before running the instructions
there's an overhead calculation with an empty loop.


So, the event I want to count is described as "The thread has completed an 
instruction" in the ISA.
There are no events that counts illegal instructions or that causes fault, so 
my guess is that
all completed functions regardless of side-effects are counted.

Putting the incrementer in ppc_tr_insn_start() like I mentioned in my previous 
reply provides results
similar to the icount version:

# ./count_instructions
test: count_instructions
tags: git_version:v5.13-5357-gdbe69e433722-dirty
Binding to cpu 0
main test running as pid 568
Overhead of null loop: 2370 instructions
instructions: result 1002370 running/enabled 1587314
cycles: result 1001372 running/enabled 1348532
Looped for 1000000 instructions, overhead 2370
Expected 1002370
Actual   1002370
Delta    0, 0.000000%
instructions: result 10010309 running/enabled 11603340
cycles: result 10009311 running/enabled 11362216
Looped for 10000000 instructions, overhead 2370
Expected 10002370
Actual   10010309
Delta    7939, 0.079308%
[FAIL] Test FAILED on line 119
failure: count_instructions

I'm aware that putting the increment in insn_start() is too early since the 
instruction didn't
even complete, so I attempted to put the increment at the end of 
ppc_tr_translate_insn(). I
ended up with fewer instructions counted for the same test:

# ./count_instructions
test: count_instructions
tags: git_version:v5.13-5357-gdbe69e433722-dirty
Binding to cpu 0
main test running as pid 575
Overhead of null loop: 1962 instructions
instructions: result 939462 running/enabled 1587310
cycles: result 1001372 running/enabled 1348532
Looped for 1000000 instructions, overhead 1962
Expected 1001962
Actual   939462
Delta    -62500, -6.652744%
[FAIL] Test FAILED on line 116
failure: count_instructions

Reading translator_loop() I can't explain why  we're missing this amount of 
functions because
translate_insn() is always called right after insns_start():


    while (true) {
        db->num_insns++;

        /* NOTE: incrementing the counter in insn_start() ppc impl gives 
similar result as icount*/
        ops->insn_start(db, cpu);
        tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */

       (...)

        /* NOTE: incrementing the counter in translate_insn() PPC impl misses 
6% of the instructions,
        even though it's always called right after insn_start() */
        if (db->num_insns == db->max_insns && (cflags & CF_LAST_IO)) {
            /* Accept I/O on the last instruction.  */
            gen_io_start();
            ops->translate_insn(db, cpu);
        } else {
            /* we should only see CF_MEMI_ONLY for io_recompile */
            tcg_debug_assert(!(cflags & CF_MEMI_ONLY));
            ops->translate_insn(db, cpu);
        }

I've recompiled QEMU with --enable-tcg-debug to see if tcg_debug_assert() right 
after insn_start() was
being hit. The assert isn't being hit in the test.


I also tried to read the instructions in tb_stop() via db->num_insns. Since I'm 
not worried about missing
the counter_overflow a few instructions it would also be an alternative. I 
changed the helper to receive
a number of instructions to be added instead of single increments:

static void ppc_tr_tb_stop(DisasContextBase *dcbase, CPUState *cs)
     target_ulong nip = ctx->base.pc_next;
     int sse;

+ gen_helper_insns_inc(cpu_env, tcg_constant_i32(dcbase->num_insns));

+
     if (is_jmp == DISAS_NORETURN) {
         /* We have already exited the TB. */
         return;


The test gives us these numbers:

# ./count_instructions
test: count_instructions
tags: git_version:v5.13-5357-gdbe69e433722-dirty
Binding to cpu 0
main test running as pid 573
Overhead of null loop: 85 instructions
instructions: result 85 running/enabled 1587316
cycles: result 1001372 running/enabled 1348534
Looped for 1000000 instructions, overhead 85
Expected 1000085
Actual   85
Delta    -1000000, -1176470.588235%
[FAIL] Test FAILED on line 116
failure: count_instructions

This is the situation I've faced some time ago and commented on a previous 
email. My hypothesis
back then was that translation_loop(), which is called via 
gen_intermediate_code() - tb_gen_code(),
was not being executed because tb_gen_code() is always guarded by a tb_lookup() 
, and
assumed that the lookup was finding the already translated TB and cpu_tb_exec() 
with it.


Well, we just saw that incrementing the counter in both insn_start() and 
translate_insn() gives
a far greater number than trying to sum up all the db->num_insns in tb_stop(), 
and all of
them are called inside translator_loop(). This invalidates (I guess) the 
assumptions I made
on tb_gen_code() up above. So yeah, I need more time to understand why I'm 
getting these
numbers.

For now, I'm really tempted into increment the instructions at insn_start() and 
come back at
instruction counting specifics in a follow-up work.


It is of course extremely heavyweight to call a helper to perform a simple 
addition operation.


This increment would update all registers that are counting instructions and
also fire a PMU interrupt when an overflow happens. I also added the frozen
count PMU bit in 'hflags' so we can increment only if the PMU is running.


There's a good chance that most of this can be done outside the helper in a
TCG function in translate.c. That would minimize the impact of the running
PMU in the translation process. I'll see how it goes.


You may also wish to look at target/hexagon, gen_exec_counters(), which is 
doing the same sort of thing.  But not completely, since it loses count along 
dynamic exception paths (e.g. faulting load).  That can be fixed as well, via 
restore_state_to_opc.


I saw the hexagon code and attempted to accumulate the instructions in
a 'num_insn' ctx variable, that was incremented in insns_start(), and then
update the PMU counters in tb_stop() with it. The results were similar
to what I've described above with db->num_insns in tb_stop().



Thanks,


Daniel

r~

Re: [PATCH 12/19] target/ppc/pmu_book3s_helper.c: enable PMC1 counter negative EBB

Reply via email to