Hello,

I used Juan's latest bits to try out migration on large guest configurations. 
Workload : a slightly modified SpecJBB.

The following is some very preliminary data...  (for the non-XBZRLE case).

FYI...
Vinod



Configuration:
------------------

Source & Target Host Hardware : 8 Westmere socket + 1TB each. HT off.
                          (10Gb back-to-back connection dedicated for live 
migration traffic)

Source & Target Host OS :  3.5-rc6+  (picked from kvm.git)


Guest OS : 3.4.1

qemu :  git://repo.or.cz/qemu/quintela.git -b migration-next-v3

----
1) 10VCPU / 128GB Guest
------------------------

(qemu) migrate_set_speed 10G
(qemu) migrate_set_downtime 2


a) Idle guest:

transferred ram: 2841554 kbytes
total ram: 134226368 kbytes
total time: 119350 milliseconds

Number of pre-copy interations : 1766
Stage_3_time : ~3271 ms


b) SpecJBB (10 warehouse threads made to run for 10mins).

transferred ram: 236383717 kbytes
total ram: 134226368 kbytes
total time: 619110 milliseconds

Number of pre-copy interations : 145515
Stage_3_time : ~3469 ms


2) 20VCPUs / 256GB guest
------------------------
(qemu) migrate_set_speed 10G
(qemu) migrate_set_downtime 2

a) Idle guest:

transferred ram: 5257340 kbytes
total ram: 268444096 kbytes
total time: 256496 milliseconds

Number of pre-copy interations : 3379
Stage_3_time : 4281 ms


b) SpecJBB (20 warehouse threads made to run for 10mins)

transferred ram: 151607814 kbytes
total ram: 268444096 kbytes
total time: 653578 milliseconds

Number of pre-copy interations : 28433
Stage_3_time : ~2670 ms


3) 40VCPUs/ 512GB guest
------------------------
(qemu) migrate_set_speed 10G
(qemu) migrate_set_downtime 2

a) Idle guest:

transferred ram: 9534968 kbytes
total ram: 536879552 kbytes
total time: 665557 milliseconds

Number of pre-copy interations : 11541
Stage_3_time : ~6210 ms


b) SpecJBB (40 warehouse threads made to run for 10mins)

transferred ram: 47845021 kbytes
total ram: 536879552 kbytes
total time: 760423 milliseconds

Number of pre-copy interations : 15963
Stage_3_time : ~6180 ms


------

Note 1 : Stage3  time (aka "down" time) listed above is an approximation.  I 
measured the time spent in the ram_save_complete() routine. Notice that the 
Stage3 duration did exceed the 2 second downtime limit that was specified. Also 
Stage3 time seems to vary quite a bit from run to run with the same 
configuration/workload.


Note 2: Modified SpecJBB to run for 10min duration with just a fixed number of  
warehouse threads and 24GB heap. [ Did not try out any NUMA or other tuning 
etc. for these runs]

Some observations:

- In all the cases the live guest migration converged only after the workload  
completed running.

- Did not observe any guest freezes during the Stage2 (which was happening a 
lot earlier with the earlier versions).  The ssh sessions to the guest stayed 
up for the entire duration. (I forgot to run a ping to the guest...will do that 
next time).

-  Although the SPECJbb was not really run like a typical benchmark but just as 
a sample workload... I did observe the throughput i.e. Bops (during live 
migration vs. when its run normally in a guest of the same size) . During live 
guest migration the Bops # dropped by ~20 %.  Need to analyze this further.  I 
am curious to hear what performance impacts others have experienced with either 
SpecJBB or other workloads during KVM live guest migration. [ As was suggested 
earlier...Perhaps having a migration thread along with other optimizations 
around dirty page tracking may help].

- Migration (TX) traffic [observed via "iftop" utility] ranged between 
1.5Gb-3.0Gb(occasionally a bit higher) through the dedicated   10Gb link. IOW, 
the dedicated link was not saturated to near line rate.  Need to investigate   
this further...i.e.  not sure if this is all due to the overhead of tracking 
dirty pages or something  else.  Any ideas ?

Reply via email to