Pierrick Bouvier <pierrick.bouv...@linaro.org> writes: > Hi Julian, > > On 12/20/24 03:47, Julian Ganz wrote: >> Hi Pierrick, >> December 5, 2024 at 12:33 AM, "Pierrick Bouvier" wrote: >>> On 12/2/24 11:41, Julian Ganz wrote: >>>> +static void insn_exec(unsigned int vcpu_index, void *userdata) >>>> +{ >>>> + struct cpu_state *state = qemu_plugin_scoreboard_find(states, >>>> vcpu_index); >>>> + uint64_t pc = (uint64_t) userdata; >>>> + GString* report; >>>> + >>>> + if (state->has_next) { >>>> + if (state->next_pc != pc) { >>>> + report = g_string_new("Trap target PC mismatch\n"); >>>> + g_string_append_printf(report, >>>> + "Expected: %"PRIx64"\nEncountered: %" >>>> + PRIx64"\n", >>>> + state->next_pc, pc); >>>> + qemu_plugin_outs(report->str); >>>> + if (abort_on_mismatch) { >>>> + g_abort(); >>>> + } >>>> + g_string_free(report, true); >>>> + } >>>> + state->has_next = false; >>>> + } >>>> +} >>>> >>> When booting an arm64 vm, I get this message: >>> Trap target PC mismatch >>> Expected: 23faf3a80 >>> Encountered: 23faf3a84 >> A colleague of mine went to great lengths trying to track and >> reliably >> reproduce this. We think that it's something amiss with the existing >> instruction exec callback infrastructure. So... it's not something I'll >> be addressing with the next iteration as it's out of scope. We'll >> probably continue looking into it, though. >> The mismatch is reported perfectly normal and boring exceptions and >> interrupts with no indication of any differences to other (not reported) >> events that fire on a regular basis. Apparently, once in a blue moon >> (relatively speaking), for the first instruction of a handler (even >> though it is definitely executed and qemu does print a trace-line for >> that instruction): >> | Trace 0: 0x7fffa0b03900 >> [00104004/000000023fde73b4/00000021/ff020200] >> | Trace 0: 0x7fffa02d9580 [00104004/000000023fde72b8/00000021/ff020200] >> | Trace 0: 0x7fffa02dfc40 [00104004/000000023fde7338/00000021/ff020200] >> | Trace 0: 0x7fffa0b03d00 [00104004/000000023fde73d4/00000021/ff020200] >> | Trace 0: 0x7fffa0b03e80 [00104004/000000023fde73d8/00000021/ff020200] >> | Trace 0: 0x7fffa0b04140 [00104004/000000023fde7408/00000021/ff020200] >> | Trace 0: 0x7fffa02dd6c0 [00104004/000000023fde70b8/00000021/ff020200] >> | Trace 0: 0x7fffa02dd800 [00104004/000000023fde7b90/00000021/ff020200] >> | cpu_io_recompile: rewound execution of TB to 000000023fde7b90 >> | Taking exception 5 [IRQ] on CPU 0 >> | ...from EL1 to EL1 >> | ...with ESR 0x0/0x3800000 >> | ...with SPSR 0x20000305 >> | ...with ELR 0x23fde7b90 >> | ...to EL1 PC 0x23fd77a80 PSTATE 0x23c5 >> | Trace 0: 0x7fffa13a8340 [00104004/000000023fd77a80/00000021/ff021201] >> | Trace 0: 0x7fffa13a8480 [00104004/000000023fd77a84/00000021/ff020200] >> | Trap target PC mismatch CPU 0 >> | Expected: 23fd77a80 >> | Encountered: 23fd77a84 >> | warning: 44 ./nptl/pthread_kill.c: No such file or directory >> | Couldn't get registers: No such process. >> It does show up with both single-core and multi-core VMs, so that at >> least eliminates some possibilities. Maybe :/ >> The issue is nasty to reproduce in a way that allows any meaningful >> investigation. It usually involves sifting through many GBs of Qemu logs >> for maybe one occurance. We could add another testing/dummy plugin that >> just prints the PC for _any_ instruction executed and have a skript >> check for non-alternating Trace-lines from Qemu and that Plugin. But >> then we're talking nearly double the amount of Lines to look through >> with probably little additional information. >> > > Thanks for the investigation. > I could reproduce this with this command line: > ./build/qemu-system-aarch64 -M virt -plugin > ./build/tests/tcg/plugins/libdiscons.so,abort=on -m 8G -device > virtio-blk-pci,drive=root -drive > if=none,id=root,file=/home/user/.work/images/debianaarch64.img -M virt > -cpu max,pauth=off -drive > if=pflash,readonly=on,file=/usr/share/AAVMF/AAVMF_CODE.fd -drive > if=pflash,file=/home/user/.work/images/AAVMF_VARS.fd -d > plugin,in_asm,op -D crash.log > > # -d plugin,in_asm,op allows to dump asm of every translated block, > plugin output (for discon plugin), and tcg op generated. > > It reliably crashes with a single address. > Looking at the debug output (crash.log): > ---------------- > IN: > 0x23faf3a80: d108c3ff sub sp, sp, #0x230 > # => This bb has a single instruction as input > > OP: > # this is the TB instrumentation > ld_i32 loc0,env,$0xfffffffffffffff8 > brcond_i32 loc0,$0x0,lt,$L0 > st8_i32 $0x1,env,$0xfffffffffffffffc > > ---- 0000000000000a80 0000000000000000 0000000000000000 > # => we can see that there is no call_plugin, looks like > instrumentation # is not applied
That is expected. We have previously called the instrumentation for the instruction in the block that triggered cpu_io_recompile() so we suppress everything apart from memory instrumentation for the re-execution so we don't double count. > sub_i64 sp,sp,$0x230 > add_i64 pc,pc,$0x4 > goto_tb $0x1 > exit_tb $0x7f7eedd355c1 > set_label $L0 > exit_tb $0x7f7eedd355c3 > > ---------------- > IN: > 0x23faf3a84: a9b007e0 stp x0, x1, [sp, #-0x100]! > 0x23faf3a88: a9010fe2 stp x2, x3, [sp, #0x10] > ... > > OP: > ld_i32 loc0,env,$0xfffffffffffffff8 > brcond_i32 loc0,$0x0,lt,$L0 > st8_i32 $0x0,env,$0xfffffffffffffffc > > ---- 0000000000000a84 0000000000000000 0000000000000000 > # instruction is correctly applied > call plugin(0x7f7eec96d530),$0x1,$0,$0x0,$0x23faf3a84 > mov_i64 loc2,sp > ... > > Trap target PC mismatch > Expected: 23faf3a80 > Encountered: 23faf3a84 > > The thing interesting here is that we can notice that 23faf3a80 is a > translation block with a single instruction, and we can see that > instrumentation is not applied for this instruction (call_plugin is > not present). > > Overall, it really looks like a bug on QEMU side, where we miss > instrumenting something. I'll take a look. You can ignore this for > now. See my other email. I think the bug was allowing an async IRQ to interfere with our "special" single instruction block. > >> Regards, >> Julian -- Alex Bennée Virtualisation Tech Lead @ Linaro