On Mon, 27 Jan 2025 14:24:56 +0100 Philippe Mathieu-Daudé <phi...@linaro.org> wrote:
> +Peter > > On 23/1/25 14:38, Igor Mammedov wrote: > > On Thu, 23 Jan 2025 12:09:59 +0000 > > Alex Bennée <alex.ben...@linaro.org> wrote: > > > >> Igor Mammedov <imamm...@redhat.com> writes: > >> > >>> On Thu, 23 Jan 2025 10:52:15 +0000 > >>> Alex Bennée <alex.ben...@linaro.org> wrote: > >>> > >>>> Igor Mammedov <imamm...@redhat.com> writes: > >>>> > >>>>> QEMU will crash with following debug enabled > >>>>> # define DEBUG_TLB_GATE 1 > >>>>> # define DEBUG_TLB_LOG_GATE 1 > >>>>> due to [1] introduced assert and as it > >>>>> happenstlb_flush_by_mmuidx[_async_work] > >>>>> functions are called not only from vcpu thread but also from reset > >>>>> handler > >>>>> that is called from main thread at cpu realize time when vcpu is already > >>>>> created > >>>>> x86_cpu_new -> ... -> > >>>>> x86_cpu_realizefn -> cpu_reset -> ... -> > >>>>> tcg_cpu_reset_hold > >>>>> > >>>>> drop assert to fix crash. > >>>> > >>>> Hmm the assert is there for a good reason because we do not want to be > >>>> flushing another CPUs state. However the assert itself: > >>>> > >>>> g_assert(!(cpu)->created || qemu_cpu_is_self(cpu)); > >>>> > >>>> was trying to account for pre-initialised vCPUs. What has changed? > >>>> > >>>> cpu_thread_signal_created(cpu) is called just before we start running > >>>> the main loop in mttcg_cpu_thread_fn. So any other thread messing with > >>>> the CPUs TLB can potentially mess things up. > >>> > >>> it reproduces on current master, so yes it likely has changed over time. > >>> I've just stumbled on it when attempting to get rid of cpu->created > >>> usage. > >> > >> Why the drive to get rid of cpu->created? > > > > During review of Philippe's cpu cleanups, > > I've noticed that cpu->created is mostly used for signaling > > main loop thread we've started vcpu thread with a couple of > > odd cases in tcg and kvm. > > - 1st silently bit-rotted being behind ifdefs > > - 2nd is work around CPU being prematurely visible to others > > 2nd is: > > commit 56adee407fc564da19e49cfe18e20e3da92320be > Author: Peter Xu <pet...@redhat.com> > Date: Fri Feb 17 00:18:32 2023 +0800 > > kvm: dirty-ring: Fix race with vcpu creation > > It's possible that we want to reap a dirty ring on a vcpu that is > during > creation, because the vcpu is put onto list (CPU_FOREACH visible) > before > initialization of the structures. In this case: > > qemu_init_vcpu > x86_cpu_realizefn > cpu_exec_realizefn > cpu_list_add <---- can be probed by CPU_FOREACH > qemu_init_vcpu > cpus_accel->create_vcpu_thread(cpu); > kvm_init_vcpu > map kvm_dirty_gfns <--- kvm_dirty_gfns valid > > Don't try to reap dirty ring on vcpus during creation or it'll crash. > > Looking at cpu_list_add() in cpu_exec_realizefn(): > > hw/core/cpu-common.c-190- /* Wait until cpu initialization complete > before exposing cpu. */ > hw/core/cpu-common.c:191: cpu_list_add(cpu); > > IMO the problem is with cpu_list_add(), we shouldn't expose the vCPU > until it is realized. that was my reasoning as well > > cpu_list_add() seems to be doing 2 things, auto-assign CPU index if > UNASSIGNED_CPU_INDEX, then insert to global cpus_queue. > > IIRC cpu_list_add() is called early because various cpu init code > expects cpu->index to be assigned. > > Maybe we could extract the 'safely assign an unique cpu index' part > (guarding by qemu_cpu_list_lock), having cpu_list_add() just add to > the global queue. I'll give it a try... ok, lets see if we can postpone cpu_list_add() till the end of realize time. > > > > >> I guess we could assert: > >> > >> g_assert(!current_cpu || qemu_cpu_is_self(cpu); > >> > >> as current_cpu should only be set as we go into the main thread. However > >> there is a sketchy setting of current_cpu in cpu_exec() that I'm not > >> sure should be there. Also do_run_on_cpu() messes with current_cpu in a > >> way I don't fully understand either. > > > > I'd rather not rely on that, even if it works it would be subject to > > to the same kind of breakage. > > > > How about instead of workaround check we would have 2 variants > > of tlb_flush_by_mmuidx[_async_work], one that have self check > > and other for usage externally (i.e. from reset handler). > > That won't have to rely on sketchy globals (which becomes more > > sketchy in context of Philippes's multi accel work), > > and it would clearly document what can be used externally. > > > >> > >>> > >>> > >>>>> 1) > >>>>> Fixes: f0aff0f124028 ("cputlb: add assert_cpu_is_self checks") > >>>>> Signed-off-by: Igor Mammedov <imamm...@redhat.com> > >>>>> --- > >>>>> accel/tcg/cputlb.c | 4 ---- > >>>>> 1 file changed, 4 deletions(-) > >>>>> > >>>>> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c > >>>>> index b26c0e088f..2da803103c 100644 > >>>>> --- a/accel/tcg/cputlb.c > >>>>> +++ b/accel/tcg/cputlb.c > >>>>> @@ -381,8 +381,6 @@ static void tlb_flush_by_mmuidx_async_work(CPUState > >>>>> *cpu, run_on_cpu_data data) > >>>>> uint16_t all_dirty, work, to_clean; > >>>>> int64_t now = get_clock_realtime(); > >>>>> > >>>>> - assert_cpu_is_self(cpu); > >>>>> - > >>>>> tlb_debug("mmu_idx:0x%04" PRIx16 "\n", asked); > >>>>> > >>>>> qemu_spin_lock(&cpu->neg.tlb.c.lock); > >>>>> @@ -419,8 +417,6 @@ void tlb_flush_by_mmuidx(CPUState *cpu, uint16_t > >>>>> idxmap) > >>>>> { > >>>>> tlb_debug("mmu_idx: 0x%" PRIx16 "\n", idxmap); > >>>>> > >>>>> - assert_cpu_is_self(cpu); > >>>>> - > >>>>> tlb_flush_by_mmuidx_async_work(cpu, RUN_ON_CPU_HOST_INT(idxmap)); > >>>>> } > >>>> > >> > > > > >