Igor Mammedov <imamm...@redhat.com> writes: > On Thu, 23 Jan 2025 12:23:43 +0100 > Igor Mammedov <imamm...@redhat.com> wrote: > >> On Thu, 23 Jan 2025 10:52:15 +0000 >> Alex Bennée <alex.ben...@linaro.org> wrote: >> >> > Igor Mammedov <imamm...@redhat.com> writes: >> > >> > > QEMU will crash with following debug enabled >> > > # define DEBUG_TLB_GATE 1 >> > > # define DEBUG_TLB_LOG_GATE 1 >> > > due to [1] introduced assert and as it >> > > happenstlb_flush_by_mmuidx[_async_work] >> > > functions are called not only from vcpu thread but also from reset >> > > handler >> > > that is called from main thread at cpu realize time when vcpu is already >> > > created >> > > x86_cpu_new -> ... -> >> > > x86_cpu_realizefn -> cpu_reset -> ... -> >> > > tcg_cpu_reset_hold >> > > >> > > drop assert to fix crash. >> > >> > Hmm the assert is there for a good reason because we do not want to be >> > flushing another CPUs state. However the assert itself: >> > >> > g_assert(!(cpu)->created || qemu_cpu_is_self(cpu)); >> > >> > was trying to account for pre-initialised vCPUs. What has changed? >> > >> > cpu_thread_signal_created(cpu) is called just before we start running >> > the main loop in mttcg_cpu_thread_fn. So any other thread messing with >> > the CPUs TLB can potentially mess things up. >> >> it reproduces on current master, so yes it likely has changed over time. >> I've just stumbled on it when attempting to get rid of cpu->created usage. >> >> >> > > 1) >> > > Fixes: f0aff0f124028 ("cputlb: add assert_cpu_is_self checks") > > bisection points to (so above fixes was a wrong one): > 30933c4fb4f3d tcg/cputlb: remove other-cpu capability from TLB flushing > which has replaced a check with assert: > > - if (cpu->created && !qemu_cpu_is_self(cpu)) { > - async_run_on_cpu(cpu, tlb_flush_by_mmuidx_async_work, > - RUN_ON_CPU_HOST_INT(idxmap)); > - } else { > - tlb_flush_by_mmuidx_async_work(cpu, RUN_ON_CPU_HOST_INT(idxmap)); > - } > + assert_cpu_is_self(cpu); > > should we revert that instead? > > perhaps also drop 'cpu->created' check in assert_cpu_is_self as it > obviously doesn't work.
That's because the asserts only check when built with debug due to concerns about the hot path. I think in the case of cputlb we should probably use tcg_debug_assert() which at least gets enabled for --enable-debug builds without needing to manually patch the define for DEBUG_TLB. Converting the custom TLB log implementation to using tracepoints would also be nice. > > >> > > Signed-off-by: Igor Mammedov <imamm...@redhat.com> >> > > --- >> > > accel/tcg/cputlb.c | 4 ---- >> > > 1 file changed, 4 deletions(-) >> > > >> > > diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c >> > > index b26c0e088f..2da803103c 100644 >> > > --- a/accel/tcg/cputlb.c >> > > +++ b/accel/tcg/cputlb.c >> > > @@ -381,8 +381,6 @@ static void tlb_flush_by_mmuidx_async_work(CPUState >> > > *cpu, run_on_cpu_data data) >> > > uint16_t all_dirty, work, to_clean; >> > > int64_t now = get_clock_realtime(); >> > > >> > > - assert_cpu_is_self(cpu); >> > > - >> > > tlb_debug("mmu_idx:0x%04" PRIx16 "\n", asked); >> > > >> > > qemu_spin_lock(&cpu->neg.tlb.c.lock); >> > > @@ -419,8 +417,6 @@ void tlb_flush_by_mmuidx(CPUState *cpu, uint16_t >> > > idxmap) >> > > { >> > > tlb_debug("mmu_idx: 0x%" PRIx16 "\n", idxmap); >> > > >> > > - assert_cpu_is_self(cpu); >> > > - >> > > tlb_flush_by_mmuidx_async_work(cpu, RUN_ON_CPU_HOST_INT(idxmap)); >> > > } >> > >> -- Alex Bennée Virtualisation Tech Lead @ Linaro