On Tue, Feb 06, 2024 at 11:19:02PM +0000, Hao Xiang wrote: > This patchset is based on Juan Quintela's old series here > https://lore.kernel.org/all/20220802063907.18882-1-quint...@redhat.com/ > > In the multifd live migration model, there is a single migration main > thread scanning the page map, queuing the pages to multiple multifd > sender threads. The migration main thread runs zero page checking on > every page before queuing the page to the sender threads. Zero page > checking is a CPU intensive task and hence having a single thread doing > all that doesn't scale well. This change introduces a new function > to run the zero page checking on the multifd sender threads. This > patchset also lays the ground work for future changes to offload zero > page checking task to accelerator hardwares. > > Use two Intel 4th generation Xeon servers for testing. > > Architecture: x86_64 > CPU(s): 192 > Thread(s) per core: 2 > Core(s) per socket: 48 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 143 > Model name: Intel(R) Xeon(R) Platinum 8457C > Stepping: 8 > CPU MHz: 2538.624 > CPU max MHz: 3800.0000 > CPU min MHz: 800.0000 > > Perform multifd live migration with below setup: > 1. VM has 100GB memory. All pages in the VM are zero pages. > 2. Use tcp socket for live migratio. > 3. Use 4 multifd channels and zero page checking on migration main thread. > 4. Use 1/2/4 multifd channels and zero page checking on multifd sender > threads. > 5. Record migration total time from sender QEMU console's "info migrate" > command. > 6. Calculate throughput with "100GB / total time". > > +------------------------------------------------------+ > |zero-page-checking | total-time(ms) | throughput(GB/s)| > +------------------------------------------------------+ > |main-thread | 9629 | 10.38GB/s | > +------------------------------------------------------+ > |multifd-1-threads | 6182 | 16.17GB/s | > +------------------------------------------------------+ > |multifd-2-threads | 4643 | 21.53GB/s | > +------------------------------------------------------+ > |multifd-4-threads | 4143 | 24.13GB/s | > +------------------------------------------------------+
This "throughput" is slightly confusing; I was initially surprised to see a large throughput for idle guests. IMHO the "total-time" would explain. Feel free to drop that column if there's a repost. Did you check why 4 channels mostly already reached the top line? Is it because main thread is already spinning 100%? Thanks, -- Peter Xu