Hi Eduardo, Sorry for the delay.
On 2020/12/22 5:36, Eduardo Habkost wrote: > On Mon, Dec 21, 2020 at 07:36:18PM +0800, Zhenyu Ye wrote: >> Providing a optional mechanism to wait for all VCPU threads be >> created out of qemu_init_vcpu(), then we can initialize the cpu >> concurrently on the x86 architecture. >> >> This reduces the time of creating virtual machines. For example, when >> the haxm is used as the accelerator, cpus_accel->create_vcpu_thread() >> will cause at least 200ms for each cpu, extremely prolong the boot >> time. >> >> Signed-off-by: Eduardo Habkost <ehabk...@redhat.com> >> Signed-off-by: eillon <yezhen...@huawei.com> > > The patch is easier to follow now, but I have a question that may > be difficult to answer: > > What exactly is the meaning of cpu->created=true, and what > exactly would break if we never wait for cpu->created==true at all? > > I'm asking that because we might be introducing subtle races > here, if some of the remaining CPU initialization code in > x86_cpu_realizefn() [1] expects the VCPU thread to be already > initialized. > > The cpu_reset() call below is one such example (but probably not > the only one). cpu_reset() ends up calling > kvm_arch_reset_vcpu(), which seems to assume kvm_init_vcpu() was > already called. With your patch, kvm_init_vcpu() might end up > being called after kvm_arch_reset_vcpu(). > There's a chance that this happens. Could we move these (after qemu_init_vcpu()) out of x86_cpu_realizefn() to the x86_cpus_init(), after qemu_wait_all_vcpu_threads_init()? Such as: void x86_cpus_init() { foreach (cpu) { x86_cpu_new(); } qemu_wait_all_vcpu_threads_init(); foreach (cpu) { x86_cpu_new_post(); } } > Maybe a simpler alternative is to keep the existing thread > creation logic, but changing hax_cpu_thread_fn() to do less work > before calling cpu_thread_signal_created()? > > In my testing (without this patch), creation of 8 KVM VCPU > threads in a 4 core machine takes less than 3 ms. Why is > qemu_init_vcpu() taking so long on haxm? Which parts of haxm > initialization can be moved after cpu_thread_signal_created(), to > make this better? > The most time-consuming operation in haxm is ioctl(HAX_VM_IOCTL_VCPU_CREATE). Saddly this can not be split. Even if we fix the problem in haxm, other accelerators may also have this problem. So I think if we can make the x86_cpu_new() concurrently, we should try to do it. Thanks, Zhenyu