* Claudio Fontana (cfont...@suse.de) wrote: > On 4/7/22 3:57 PM, Claudio Fontana wrote: > > On 4/7/22 3:53 PM, Dr. David Alan Gilbert wrote: > >> * Claudio Fontana (cfont...@suse.de) wrote: > >>> On 4/5/22 10:35 AM, Dr. David Alan Gilbert wrote: > >>>> * Claudio Fontana (cfont...@suse.de) wrote: > >>>>> On 3/28/22 10:31 AM, Daniel P. Berrangé wrote: > >>>>>> On Sat, Mar 26, 2022 at 04:49:46PM +0100, Claudio Fontana wrote: > >>>>>>> On 3/25/22 12:29 PM, Daniel P. Berrangé wrote: > >>>>>>>> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote: > >>>>>>>>> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote: > >>>>>>>>>> * Claudio Fontana (cfont...@suse.de) wrote: > >>>>>>>>>>> On 3/17/22 2:41 PM, Claudio Fontana wrote: > >>>>>>>>>>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote: > >>>>>>>>>>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote: > >>>>>>>>>>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote: > >>>>>>>>>>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote: > >>>>>>>>>>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote: > >>>>>>>>>>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana > >>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>> the first user is the qemu driver, > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> virsh save/resume would slow to a crawl with a default > >>>>>>>>>>>>>>>>>>> pipe size (64k). > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> This improves the situation by 400%. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Going through io_helper still seems to incur in some > >>>>>>>>>>>>>>>>>>> penalty (~15%-ish) > >>>>>>>>>>>>>>>>>>> compared with direct qemu migration to a nc socket to a > >>>>>>>>>>>>>>>>>>> file. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Signed-off-by: Claudio Fontana <cfont...@suse.de> > >>>>>>>>>>>>>>>>>>> --- > >>>>>>>>>>>>>>>>>>> src/qemu/qemu_driver.c | 6 +++--- > >>>>>>>>>>>>>>>>>>> src/qemu/qemu_saveimage.c | 11 ++++++----- > >>>>>>>>>>>>>>>>>>> src/util/virfile.c | 12 ++++++++++++ > >>>>>>>>>>>>>>>>>>> src/util/virfile.h | 1 + > >>>>>>>>>>>>>>>>>>> 4 files changed, 22 insertions(+), 8 deletions(-) > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Hello, I initially thought this to be a qemu performance > >>>>>>>>>>>>>>>>>>> issue, > >>>>>>>>>>>>>>>>>>> so you can find the discussion about this in qemu-devel: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> "Re: bad virsh save /dev/null performance (600 MiB/s max)" > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Current results show these experimental averages maximum > >>>>>>>>>>>>>> throughput > >>>>>>>>>>>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per > >>>>>>>>>>>>>> QEMU QMP > >>>>>>>>>>>>>> "query-migrate", tests repeated 5 times for each). > >>>>>>>>>>>>>> VM Size is 60G, most of the memory effectively touched before > >>>>>>>>>>>>>> migration, > >>>>>>>>>>>>>> through user application allocating and touching all memory > >>>>>>>>>>>>>> with > >>>>>>>>>>>>>> pseudorandom data. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 64K: 5200 Mbps (current situation) > >>>>>>>>>>>>>> 128K: 5800 Mbps > >>>>>>>>>>>>>> 256K: 20900 Mbps > >>>>>>>>>>>>>> 512K: 21600 Mbps > >>>>>>>>>>>>>> 1M: 22800 Mbps > >>>>>>>>>>>>>> 2M: 22800 Mbps > >>>>>>>>>>>>>> 4M: 22400 Mbps > >>>>>>>>>>>>>> 8M: 22500 Mbps > >>>>>>>>>>>>>> 16M: 22800 Mbps > >>>>>>>>>>>>>> 32M: 22900 Mbps > >>>>>>>>>>>>>> 64M: 22900 Mbps > >>>>>>>>>>>>>> 128M: 22800 Mbps > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> This above is the throughput out of patched libvirt with > >>>>>>>>>>>>>> multiple Pipe Sizes for the FDWrapper. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest > >>>>>>>>>>>>> that > >>>>>>>>>>>>> libvirt attempt to raise the pipe limit to 1 MB by default, but > >>>>>>>>>>>>> not try to go higher. > >>>>>>>>>>>>> > >>>>>>>>>>>>>> As for the theoretical limit for the libvirt architecture, > >>>>>>>>>>>>>> I ran a qemu migration directly issuing the appropriate QMP > >>>>>>>>>>>>>> commands, setting the same migration parameters as per libvirt, > >>>>>>>>>>>>>> and then migrating to a socket netcatted to /dev/null via > >>>>>>>>>>>>>> {"execute": "migrate", "arguments": { "uri", > >>>>>>>>>>>>>> "unix:///tmp/netcat.sock" } } : > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> QMP: 37000 Mbps > >>>>>>>>>>>>> > >>>>>>>>>>>>>> So although the Pipe size improves things (in particular the > >>>>>>>>>>>>>> large jump is for the 256K size, although 1M seems a very good > >>>>>>>>>>>>>> value), > >>>>>>>>>>>>>> there is still a second bottleneck in there somewhere that > >>>>>>>>>>>>>> accounts for a loss of ~14200 Mbps in throughput. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Interesting addition: I tested quickly on a system with faster > >>>>>>>>>>> cpus and larger VM sizes, up to 200GB, > >>>>>>>>>>> and the difference in throughput libvirt vs qemu is basically the > >>>>>>>>>>> same ~14500 Mbps. > >>>>>>>>>>> > >>>>>>>>>>> ~50000 mbps qemu to netcat socket to /dev/null > >>>>>>>>>>> ~35500 mbps virsh save to /dev/null > >>>>>>>>>>> > >>>>>>>>>>> Seems it is not proportional to cpu speed by the looks of it (not > >>>>>>>>>>> a totally fair comparison because the VM sizes are different). > >>>>>>>>>> > >>>>>>>>>> It might be closer to RAM or cache bandwidth limited though; for > >>>>>>>>>> an extra copy. > >>>>>>>>> > >>>>>>>>> I was thinking about sendfile(2) in iohelper, but that probably > >>>>>>>>> can't work as the input fd is a socket, I am getting EINVAL. > >>>>>>>> > >>>>>>>> Yep, sendfile() requires the input to be a mmapable FD, > >>>>>>>> and the output to be a socket. > >>>>>>>> > >>>>>>>> Try splice() instead which merely requires 1 end to be a > >>>>>>>> pipe, and the other end can be any FD afaik. > >>>>>>>> > >>>>>>> > >>>>>>> I did try splice(), but performance is worse by around 500%. > >>>>>> > >>>>>> Hmm, that's certainly unexpected ! > >>>>>> > >>>>>>> Any ideas welcome, > >>>>>> > >>>>>> I learnt there is also a newer copy_file_range call, not sure if > >>>>>> that's > >>>>>> any better. > >>>>>> > >>>>>> You passed len as 1 MB, I wonder if passing MAXINT is viable ? We just > >>>>>> want to copy everything IIRC. > >>>>>> > >>>>>> With regards, > >>>>>> Daniel > >>>>>> > >>>>> > >>>>> Crazy idea, would trying to use the parallel migration concept for > >>>>> migrating to/from a file make any sense? > >>>>> > >>>>> Not sure if applying the qemu multifd implementation of this would > >>>>> apply, maybe it could be given another implementation for "toFile", > >>>>> trying to use more than one cpu to do the transfer? > >>>> > >>>> I can't see a way that would help; well, I could if you could > >>>> somehow have multiple io helper threads that dealt with it. > >>> > >>> The first issue I encounter here for both the "virsh save" and "virsh > >>> restore" scenarios is that libvirt uses fd: migration, not unix: > >>> migration. > >>> QEMU supports multifd for unix:, tcp:, vsock: as far as I can see. > >>> > >>> Current save procedure in QMP in short: > >>> > >>> {"execute":"migrate-set-capabilities", ...} > >>> {"execute":"migrate-set-parameters", ...} > >>> {"execute":"getfd","arguments":{"fdname":"migrate"}, ...} fd=26 > >>> QEMU_MONITOR_IO_SEND_FD: fd=26 > >>> {"execute":"migrate","arguments":{"uri":"fd:migrate"}, ...} > >>> > >>> > >>> Current restore procedure in QMP in short: > >>> > >>> (start QEMU) > >>> {"execute":"migrate-incoming","arguments":{"uri":"fd:21"}, ...} > >>> > >>> > >>> Should I investigate changing libvirt to use unix: for save/restore? > >>> Or should I look into changing qemu to somehow accept fd: for multifd, > >>> meaning I guess providing multiple fd: uris in the migrate command? > >> > >> So I'm not sure this is the right direction; i.e. if multifd is the > >> right answer to your problem. > > > > Of course, just exploring the space. > > > I have some progress on multifd if we can call it so: > > I wrote a simple program that sets up a unix socket, > listens for N_CHANNELS + 1 connections there, sets up multifd parameters, and > runs the migration, > spawning threads for each incoming connection from QEMU, creating a file to > use to store the migration data coming from qemu (optionally using O_DIRECT). > > This program plays the role of a "iohelper"-like thing, basically just > copying things over, making O_DIRECT possible. > > I save the data streams to multiple files; this works, for the actual results > though I will have to migrate to a better hardware setup (enterprise nvme + > fast cpu, under various memory configurations). > > The intuition would be that if we have enough cpus to spare (no libvirt in > the picture as mentioned for now), > say, the same 4 cpus already allocated for a certain VM to run, we can use > those cpus (now "free" since we suspended the guest) > to compress each multifd channel (multifd-zstd? multifd-zlib?), thus reducing > the amount of stuff that needs to go to disk, making use of those cpus.
Yes possibly; you have an advantage over ormal migration, in that your vCPUs are stopped. > Work in progress... > > > > >> However, I think the qemu code probably really really wants to be a > >> socket. > > > > Understood, I'll try to bend libvirt to use unix:/// and see how far I get, > > > > Thanks, > > > > Claudio > > > >> > >> Dave > >> > >>> > >>> Thank you for your help, > >>> > >>> Claudio > >>> > > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK