Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

Zhenyu Ye Thu, 17 Sep 2020 01:20:39 -0700

Hi Daniel,

On 2020/9/14 22:42, Daniel P. Berrangé wrote:
> On Tue, Aug 11, 2020 at 09:54:08PM +0800, Zhenyu Ye wrote:
>> Hi Kevin,
>>
>> On 2020/8/10 23:38, Kevin Wolf wrote:
>>> Am 10.08.2020 um 16:52 hat Zhenyu Ye geschrieben:
>>>> Before doing qmp actions, we need to lock the qemu_global_mutex,
>>>> so the qmp actions should not take too long time.
>>>>
>>>> Unfortunately, some qmp actions need to acquire aio context and
>>>> this may take a long time.  The vm will soft lockup if this time
>>>> is too long.
>>>
>>> Do you have a specific situation in mind where getting the lock of an
>>> AioContext can take a long time? I know that the main thread can
>>> block for considerable time, but QMP commands run in the main thread, so
>>> this patch doesn't change anything for this case. It would be effective
>>> if an iothread blocks, but shouldn't everything running in an iothread
>>> be asynchronous and therefore keep the AioContext lock only for a short
>>> time?
>>>
>>
>> Theoretically, everything running in an iothread is asynchronous. However,
>> some 'asynchronous' actions are not non-blocking entirely, such as
>> io_submit().  This will block while the iodepth is too big and I/O pressure
>> is too high.  If we do some qmp actions, such as 'info block', at this time,
>> may cause vm soft lockup.  This series can make these qmp actions safer.
>>
>> I constructed the scene as follow:
>> 1. create a vm with 4 disks, using iothread.
>> 2. add press to the CPU on the host.  In my scene, the CPU usage exceeds 95%.
>> 3. add press to the 4 disks in the vm at the same time.  I used the fio and
>> some parameters are:
>>
>>       fio -rw=randrw -bs=1M -size=1G -iodepth=512 -ioengine=libaio -numjobs=4
>>
>> 4. do block query actions, for example, by virsh:
>>
>>      virsh qemu-monitor-command [vm name] --hmp info block
>>
>> Then the vm will soft lockup, the calltrace is:
> 
> [snip]
> 
>> This problem can be avoided after this series applied.
> 
> At what cost though ?   With this timeout, QEMU is going to start
> reporting bogus failures for various QMP commands when running
> under high load, even if those commands would actually run
> successfully.  This will turn into an error report from libvirt
> which will in turn probably cause an error in the mgmt application
> using libvirt, and in turn could break the user's automation.
>


I think it's worth reporting an error to avoid the VM softlockup.
The VM may even crash if kernel.softlockup_panic is configured!

We can increase the timeout value (close to the VM cpu soft lock time)
to avoid unnecessary errors.

Thanks,
Zhenyu

Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

Reply via email to