Hi Eduardo,

Sorry for the delay.

On 2020/12/22 5:36, Eduardo Habkost wrote:
> On Mon, Dec 21, 2020 at 07:36:18PM +0800, Zhenyu Ye wrote:
>> Providing a optional mechanism to wait for all VCPU threads be
>> created out of qemu_init_vcpu(), then we can initialize the cpu
>> concurrently on the x86 architecture.
>>
>> This reduces the time of creating virtual machines. For example, when
>> the haxm is used as the accelerator, cpus_accel->create_vcpu_thread()
>> will cause at least 200ms for each cpu, extremely prolong the boot
>> time.
>>
>> Signed-off-by: Eduardo Habkost <ehabk...@redhat.com>
>> Signed-off-by: eillon <yezhen...@huawei.com>
> 
> The patch is easier to follow now, but I have a question that may
> be difficult to answer:
> 
> What exactly is the meaning of cpu->created=true, and what
> exactly would break if we never wait for cpu->created==true at all?
> 
> I'm asking that because we might be introducing subtle races
> here, if some of the remaining CPU initialization code in
> x86_cpu_realizefn() [1] expects the VCPU thread to be already
> initialized.
> 
> The cpu_reset() call below is one such example (but probably not
> the only one).  cpu_reset() ends up calling
> kvm_arch_reset_vcpu(), which seems to assume kvm_init_vcpu() was
> already called.  With your patch, kvm_init_vcpu() might end up
> being called after kvm_arch_reset_vcpu().
> 

There's a chance that this happens.
Could we move these (after qemu_init_vcpu()) out of x86_cpu_realizefn()
to the x86_cpus_init(), after qemu_wait_all_vcpu_threads_init()?
Such as:

void x86_cpus_init()
{
        foreach (cpu) {
                x86_cpu_new();
        }

        qemu_wait_all_vcpu_threads_init();

        foreach (cpu) {
                x86_cpu_new_post();
        }
}

> Maybe a simpler alternative is to keep the existing thread
> creation logic, but changing hax_cpu_thread_fn() to do less work
> before calling cpu_thread_signal_created()?
> 
> In my testing (without this patch), creation of 8 KVM VCPU
> threads in a 4 core machine takes less than 3 ms.  Why is
> qemu_init_vcpu() taking so long on haxm?  Which parts of haxm
> initialization can be moved after cpu_thread_signal_created(), to
> make this better?
> 

The most time-consuming operation in haxm is ioctl(HAX_VM_IOCTL_VCPU_CREATE).
Saddly this can not be split.

Even if we fix the problem in haxm, other accelerators may also have
this problem.  So I think if we can make the x86_cpu_new() concurrently,
we should try to do it.

Thanks,
Zhenyu

Reply via email to