On Sep 17 12:05, Alex Bennée wrote: > Aaron Lindsay <aa...@os.amperecomputing.com> writes: > > In looking at QEMU's source, I *think* this is because the > > `gen_store_exclusive` function in translate-a64.c is not making the same > > calls to `plugin_gen_mem_callbacks` & company that are being made by > > "normal" stores handled by functions like `tcg_gen_qemu_st_i64` (at > > least in my case; I do see some code paths under `gen_store_exclusive` > > call down into `tcg_gen_qemu_st_i64` eventually, but it appears not all > > of them do?). > > The key TCG operation is the cmpxchg which does the effective store. For > -smp 1 we should use normal ld and st tcg ops. For > 1 it eventually > falls to tcg_gen_atomic_cmpxchg_XX which is a helper. That eventually > ends up at: > > atomic_trace_rmw_post > > which should be where things are hooked.
If I am understanding you correctly, it seems like my `stxp` should be using the "normal" load and store tcg ops since I am running with `-smp 1`, and therefore correctly emitting plugin memory callbacks. I think my next step is to figure out exactly which tcg code path is being used for this instruction to remove any doubt about what's going on here. > > Does my initial guess check out? And, if so, does anyone have insight > > into how to fix this issue most cleanly/generically? I suspect if/when I > > debug my particular case I can discover one code path to fix, but I'm > > wondering if my discovery may be part of a larger class of cases which > > fell through the cracks and ought to be fixed together. > > Have you got simple example of a test case? My test case is reasonably simple - I can reproduce the issue reliably and in under 5 minutes - but I don't currently have a self-contained version in a form I can share. Here is the surrounding dynamic instruction stream, as reported by the plugin interface (via callbacks registered with `qemu_plugin_register_vcpu_insn_exec_cb`), along with corresponding memory accesses (reported via callbacks registered with `qemu_plugin_register_vcpu_mem_cb`): pc ( opcode ): `disassembly` ------------------|-------------|------------- 0xffff0000082076b4 (0x9436c8a9): `bl #0xffff000008fb9958` 0xffff000008fb9958 (0xf9800091): `prfm pstl1strm, [x4]` 0xffff000008fb995c (0xc87f4490): `ldxp x16, x17, [x4]` ^ accesses virtual addresses: 0xffff8002fffdde60, 0xffff8002fffdde68 0xffff000008fb9960 (0xca000210): `eor x16, x16, x0` 0xffff000008fb9964 (0xca010231): `eor x17, x17, x1` 0xffff000008fb9968 (0xaa110211): `orr x17, x16, x17` 0xffff000008fb996c (0xb5000071): `cbnz x17, #0xffff000008fb9978` 0xffff000008fb9970 (0xc8300c82): `stxp w16, x2, x3, [x4]` 0xffff000008fb9974 (0x35ffff50): `cbnz w16, #0xffff000008fb995c` 0xffff000008fb9978 (0xaa1103e0): `mov x0, x17` 0xffff000008fb997c (0xd65f03c0): `ret ` 0xffff0000082076b8 (0xd503201f): `nop ` 0xffff0000082076bc (0xd503201f): `nop ` 0xffff0000082076c0 (0xd503201f): `nop ` 0xffff0000082076c4 (0xb94010a1): `ldr w1, [x5, #0x10]` ^ accesses virtual addresses: 0xffff8002f18b5cd0 0xffff0000082076c8 (0x51000421): `sub w1, w1, #1` 0xffff0000082076cc (0xb90010a1): `str w1, [x5, #0x10]` ^ accesses virtual addresses: 0xffff8002f18b5cd0 0xffff0000082076d0 (0x35000061): `cbnz w1, #0xffff0000082076dc` Notice that the `stxp` receives no corresponding callbacks via `qemu_plugin_register_vcpu_mem_cb` like the `ldxp`, `ldr`, and `str` do. -Aaron