Re: [RFC PATCH v3 11/11] tests: add plugin asserting correctness of discon event's to_pc

Julian Ganz Fri, 20 Dec 2024 03:47:52 -0800

Hi Pierrick,

December 5, 2024 at 12:33 AM, "Pierrick Bouvier" wrote:
> On 12/2/24 11:41, Julian Ganz wrote:
> >  +static void insn_exec(unsigned int vcpu_index, void *userdata)
> >  +{
> >  + struct cpu_state *state = qemu_plugin_scoreboard_find(states, 
> > vcpu_index);
> >  + uint64_t pc = (uint64_t) userdata;
> >  + GString* report;
> >  +
> >  + if (state->has_next) {
> >  + if (state->next_pc != pc) {
> >  + report = g_string_new("Trap target PC mismatch\n");
> >  + g_string_append_printf(report,
> >  + "Expected: %"PRIx64"\nEncountered: %"
> >  + PRIx64"\n",
> >  + state->next_pc, pc);
> >  + qemu_plugin_outs(report->str);
> >  + if (abort_on_mismatch) {
> >  + g_abort();
> >  + }
> >  + g_string_free(report, true);
> >  + }
> >  + state->has_next = false;
> >  + }
> >  +}
> > 
> When booting an arm64 vm, I get this message:
> Trap target PC mismatch
> Expected: 23faf3a80
> Encountered: 23faf3a84


A colleague of mine went to great lengths trying to track and reliably
reproduce this. We think that it's something amiss with the existing
instruction exec callback infrastructure. So... it's not something I'll
be addressing with the next iteration as it's out of scope. We'll
probably continue looking into it, though.

The mismatch is reported perfectly normal and boring exceptions and
interrupts with no indication of any differences to other (not reported)
events that fire on a regular basis. Apparently, once in a blue moon
(relatively speaking), for the first instruction of a handler (even
though it is definitely executed and qemu does print a trace-line for
that instruction):

| Trace 0: 0x7fffa0b03900 [00104004/000000023fde73b4/00000021/ff020200]
| Trace 0: 0x7fffa02d9580 [00104004/000000023fde72b8/00000021/ff020200]
| Trace 0: 0x7fffa02dfc40 [00104004/000000023fde7338/00000021/ff020200]
| Trace 0: 0x7fffa0b03d00 [00104004/000000023fde73d4/00000021/ff020200]
| Trace 0: 0x7fffa0b03e80 [00104004/000000023fde73d8/00000021/ff020200]
| Trace 0: 0x7fffa0b04140 [00104004/000000023fde7408/00000021/ff020200]
| Trace 0: 0x7fffa02dd6c0 [00104004/000000023fde70b8/00000021/ff020200]
| Trace 0: 0x7fffa02dd800 [00104004/000000023fde7b90/00000021/ff020200]
| cpu_io_recompile: rewound execution of TB to 000000023fde7b90
| Taking exception 5 [IRQ] on CPU 0
| ...from EL1 to EL1
| ...with ESR 0x0/0x3800000
| ...with SPSR 0x20000305
| ...with ELR 0x23fde7b90
| ...to EL1 PC 0x23fd77a80 PSTATE 0x23c5
| Trace 0: 0x7fffa13a8340 [00104004/000000023fd77a80/00000021/ff021201]
| Trace 0: 0x7fffa13a8480 [00104004/000000023fd77a84/00000021/ff020200]
| Trap target PC mismatch CPU 0
| Expected:    23fd77a80
| Encountered: 23fd77a84
| warning: 44   ./nptl/pthread_kill.c: No such file or directory
| Couldn't get registers: No such process.

It does show up with both single-core and multi-core VMs, so that at
least eliminates some possibilities. Maybe :/

The issue is nasty to reproduce in a way that allows any meaningful
investigation. It usually involves sifting through many GBs of Qemu logs
for maybe one occurance. We could add another testing/dummy plugin that
just prints the PC for _any_ instruction executed and have a skript
check for non-alternating Trace-lines from Qemu and that Plugin. But
then we're talking nearly double the amount of Lines to look through
with probably little additional information.

Regards,
Julian

Re: [RFC PATCH v3 11/11] tests: add plugin asserting correctness of discon event's to_pc

Reply via email to