On Thu, 23 Jan 2025 12:23:43 +0100 Igor Mammedov <imamm...@redhat.com> wrote:
> On Thu, 23 Jan 2025 10:52:15 +0000 > Alex Bennée <alex.ben...@linaro.org> wrote: > > > Igor Mammedov <imamm...@redhat.com> writes: > > > > > QEMU will crash with following debug enabled > > > # define DEBUG_TLB_GATE 1 > > > # define DEBUG_TLB_LOG_GATE 1 > > > due to [1] introduced assert and as it > > > happenstlb_flush_by_mmuidx[_async_work] > > > functions are called not only from vcpu thread but also from reset handler > > > that is called from main thread at cpu realize time when vcpu is already > > > created > > > x86_cpu_new -> ... -> > > > x86_cpu_realizefn -> cpu_reset -> ... -> > > > tcg_cpu_reset_hold > > > > > > drop assert to fix crash. > > > > Hmm the assert is there for a good reason because we do not want to be > > flushing another CPUs state. However the assert itself: > > > > g_assert(!(cpu)->created || qemu_cpu_is_self(cpu)); > > > > was trying to account for pre-initialised vCPUs. What has changed? > > > > cpu_thread_signal_created(cpu) is called just before we start running > > the main loop in mttcg_cpu_thread_fn. So any other thread messing with > > the CPUs TLB can potentially mess things up. > > it reproduces on current master, so yes it likely has changed over time. > I've just stumbled on it when attempting to get rid of cpu->created usage. > > > > > 1) > > > Fixes: f0aff0f124028 ("cputlb: add assert_cpu_is_self checks") bisection points to (so above fixes was a wrong one): 30933c4fb4f3d tcg/cputlb: remove other-cpu capability from TLB flushing which has replaced a check with assert: - if (cpu->created && !qemu_cpu_is_self(cpu)) { - async_run_on_cpu(cpu, tlb_flush_by_mmuidx_async_work, - RUN_ON_CPU_HOST_INT(idxmap)); - } else { - tlb_flush_by_mmuidx_async_work(cpu, RUN_ON_CPU_HOST_INT(idxmap)); - } + assert_cpu_is_self(cpu); should we revert that instead? perhaps also drop 'cpu->created' check in assert_cpu_is_self as it obviously doesn't work. > > > Signed-off-by: Igor Mammedov <imamm...@redhat.com> > > > --- > > > accel/tcg/cputlb.c | 4 ---- > > > 1 file changed, 4 deletions(-) > > > > > > diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c > > > index b26c0e088f..2da803103c 100644 > > > --- a/accel/tcg/cputlb.c > > > +++ b/accel/tcg/cputlb.c > > > @@ -381,8 +381,6 @@ static void tlb_flush_by_mmuidx_async_work(CPUState > > > *cpu, run_on_cpu_data data) > > > uint16_t all_dirty, work, to_clean; > > > int64_t now = get_clock_realtime(); > > > > > > - assert_cpu_is_self(cpu); > > > - > > > tlb_debug("mmu_idx:0x%04" PRIx16 "\n", asked); > > > > > > qemu_spin_lock(&cpu->neg.tlb.c.lock); > > > @@ -419,8 +417,6 @@ void tlb_flush_by_mmuidx(CPUState *cpu, uint16_t > > > idxmap) > > > { > > > tlb_debug("mmu_idx: 0x%" PRIx16 "\n", idxmap); > > > > > > - assert_cpu_is_self(cpu); > > > - > > > tlb_flush_by_mmuidx_async_work(cpu, RUN_ON_CPU_HOST_INT(idxmap)); > > > } > > >