On 3/9/22 3:22 PM, Claudio Fontana wrote:
> On 3/9/22 12:51 PM, Daniel P. Berrangé wrote:
>> On Wed, Mar 09, 2022 at 11:43:48AM +0000, Dr. David Alan Gilbert wrote:
>>> * Claudio Fontana (cfont...@suse.de) wrote:
>>>> On 3/7/22 1:28 PM, Dr. David Alan Gilbert wrote:
>>>>> * Claudio Fontana (cfont...@suse.de) wrote:
>>>>>> On 3/7/22 1:20 PM, Daniel P. Berrangé wrote:
>>>>>>> On Mon, Mar 07, 2022 at 01:09:55PM +0100, Claudio Fontana wrote:
>>>>>>>> Got it, this explains it, sorry for the noise on this.
>>>>>>>>
>>>>>>>> I'll continue to investigate the general issue of low throughput with 
>>>>>>>> virsh save / qemu savevm .
>>>>>>>
>>>>>>> BTW, consider measuring with the --bypass-cache flag to virsh save.
>>>>>>> This causes libvirt to use a I/O helper that uses O_DIRECT when
>>>>>>> saving the image. This should give more predictable results by
>>>>>>> avoiding the influence of host I/O cache which can be in a differnt
>>>>>>> state of usage each time you measure.  It was also intended that
>>>>>>> by avoiding hitting cache, saving the memory image of a large VM
>>>>>>> will not push other useful stuff out of host I/O  cache which can
>>>>>>> negatively impact other running VMs.
>>>>>>>
>>>>>>> Also it is possible to configure compression on the libvirt side
>>>>>>> which may be useful if you have spare CPU cycles, but your storage
>>>>>>> is slow. See 'save_image_format' in the /etc/libvirt/qemu.conf
>>>>>>>
>>>>>>> With regards,
>>>>>>> Daniel
>>>>>>>
>>>>>>
>>>>>> Hi Daniel, thanks for these good info,
>>>>>>
>>>>>> regarding slow storage, for these tests I am saving to /dev/null to 
>>>>>> avoid having to take storage into account
>>>>>> (and still getting low bandwidth unfortunately) so I guess compression 
>>>>>> is out of the question.
>>>>>
>>>>> What type of speeds do you get if you try a migrate to a netcat socket?
>>>>
>>>> much faster apparently, 30 sec savevm vs 7 seconds for migration to a 
>>>> netcat socket sent to /dev/null.
>>>>
>>>> nc -l -U /tmp/savevm.socket
>>>>
>>>> virsh suspend centos7
>>>> Domain centos7 suspended
>>>>
>>>> virsh qemu-monitor-command --cmd '{ "execute": "migrate", "arguments": { 
>>>> "uri": "unix:///tmp/savevm.socket" } }' centos7
>>>>
>>>> virt97:/mnt # virsh qemu-monitor-command --cmd '{ "execute": 
>>>> "query-migrate" }' centos7
>>>> {"return":{"blocked":false,"status":"completed","setup-time":118,"downtime":257,"total-time":7524,"ram":{"total":32213049344,"postcopy-requests":0,"dirty-sync-count":3,"multifd-bytes":0,"pages-per-second":1057530,"page-size":4096,"remaining":0,"mbps":24215.572437483122,"transferred":22417172290,"duplicate":2407520,"dirty-pages-rate":0,"skipped":0,"normal-bytes":22351847424,"normal":5456994}},"id":"libvirt-438"}
>>>>
>>>> virt97:/mnt # virsh qemu-monitor-command --cmd '{ "execute": 
>>>> "query-migrate-parameters" }' centos7
>>>> {"return":{"cpu-throttle-tailslow":false,"xbzrle-cache-size":67108864,"cpu-throttle-initial":20,"announce-max":550,"decompress-threads":2,"compress-threads":8,"compress-level":0,"multifd-channels":8,"multifd-zstd-level":1,"announce-initial":50,"block-incremental":false,"compress-wait-thread":true,"downtime-limit":300,"tls-authz":"","multifd-compression":"none","announce-rounds":5,"announce-step":100,"tls-creds":"","multifd-zlib-level":1,"max-cpu-throttle":99,"max-postcopy-bandwidth":0,"tls-hostname":"","throttle-trigger-threshold":50,"max-bandwidth":9223372036853727232,"x-checkpoint-delay":20000,"cpu-throttle-increment":10},"id":"libvirt-439"}
>>>>
>>>>
>>>> I did also a run with multifd-channels:1 instead of 8, if it matters:
>>>
>>> I suspect you haven't actually got multifd enabled ( check
>>> query-migrate-capabilities ?).
>>>>
>>>> virt97:/mnt # virsh qemu-monitor-command --cmd '{ "execute": 
>>>> "query-migrate" }' centos7
>>>> {"return":{"blocked":false,"status":"completed","setup-time":119,"downtime":260,"total-time":8601,"ram":{"total":32213049344,"postcopy-requests":0,"dirty-sync-count":3,"multifd-bytes":0,"pages-per-second":908820,"page-size":4096,"remaining":0,"mbps":21141.861157274227,"transferred":22415264188,"duplicate":2407986,"dirty-pages-rate":0,"skipped":0,"normal-bytes":22349938688,"normal":5456528}},"id":"libvirt-453"}
>>>>
>>>> virt97:/mnt # virsh qemu-monitor-command --cmd '{ "execute": 
>>>> "query-migrate-parameters" }' centos7
>>>> {"return":{"cpu-throttle-tailslow":false,"xbzrle-cache-size":67108864,"cpu-throttle-initial":20,"announce-max":550,"decompress-threads":2,"compress-threads":8,"compress-level":0,"multifd-channels":1,"multifd-zstd-level":1,"announce-initial":50,"block-incremental":false,"compress-wait-thread":true,"downtime-limit":300,"tls-authz":"","multifd-compression":"none","announce-rounds":5,"announce-step":100,"tls-creds":"","multifd-zlib-level":1,"max-cpu-throttle":99,"max-postcopy-bandwidth":0,"tls-hostname":"","throttle-trigger-threshold":50,"max-bandwidth":9223372036853727232,"x-checkpoint-delay":20000,"cpu-throttle-increment":10},"id":"libvirt-454"}
>>>>
>>>>
>>>> Still we are in the 20 Gbps range, or around 2560 MiB/s, way faster than 
>>>> savevm which does around 600 MiB/s when the wind is in its favor..
>>>
>>> Yeh that's what I'd hope for off a decent CPU; hmm there's not that much
>>> savevm specific is there?
>>
>> BTW, quick clarification here.
>>
>> IIUC, Claudio says the test is 'virsh save $VMNAME /some/file'. This
>> is *not* running 'savevm' at the QEMU level. So it is a bit misleading
>> refering to it as savevm in the thread here.
> 
> 
> Thanks, this is a helpful clarification, I was wrongly assuming those were 
> linked.
> Indeed the use case is virsh save.
> 
>>
>> 'virsh save' is simply wired up to the normal QEMU 'migrate' commands,
>> with libvirt giving QEMU a pre-opened FD, which libvirt processes the
>> other end of to write out to disk.
>>
>> IOW, the performance delta is possibly on libvirt's side rather
>> than QEMU's.
> 
> Interesting, also Ccing Jim on this, I'll continue to do more experiments.
> 


One difference I could see looking at the qmp commands issued by libvirt in the 
"virsh save" case,
is "detach:true" in the migration command (which seems to have no effect in 
qemu),


and maybe more interestingly this stuff about the "fd":


2022-03-09 17:29:34.247+0000: 20390: info : qemuMonitorSend:995 : 
QEMU_MONITOR_SEND_MSG: mon=0x7faa9003ebf0 
msg={"execute":"getfd","arguments":{"fdname":"migrate"},"id":"libvirt-390"}^M
 fd=34
2022-03-09 17:29:34.247+0000: 20387: info : qemuMonitorIOWrite:452 : 
QEMU_MONITOR_IO_WRITE: mon=0x7faa9003ebf0 
buf={"execute":"getfd","arguments":{"fdname":"migrate"},"id":"libvirt-390"}^M
 len=73 ret=73 errno=0
2022-03-09 17:29:34.247+0000: 20387: info : qemuMonitorIOWrite:457 : 
QEMU_MONITOR_IO_SEND_FD: mon=0x7faa9003ebf0 fd=34 ret=73 errno=0
2022-03-09 17:29:34.248+0000: 20387: info : qemuMonitorJSONIOProcessLine:240 : 
QEMU_MONITOR_RECV_REPLY: mon=0x7faa9003ebf0 reply={"return": {}, "id": 
"libvirt-390"}
2022-03-09 17:29:34.249+0000: 20390: info : qemuMonitorSend:995 : 
QEMU_MONITOR_SEND_MSG: mon=0x7faa9003ebf0 
msg={"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"},"id":"libvirt-391"}^M
 fd=-1
2022-03-09 17:29:34.249+0000: 20387: info : qemuMonitorIOWrite:452 : 
QEMU_MONITOR_IO_WRITE: mon=0x7faa9003ebf0 
buf={"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"},"id":"libvirt-391"}^M
 len=113 ret=113 errno=0


in qemu I am currently looking at the code in migration/socket.c vs the code in 
migration/fd.c , wonder if the difference would stem from there..

Thanks,

CLaudio

Reply via email to