[Qemu-devel] [PATCH v2 2/7] KVM: MMU: introduce possible_writable_spte_bitmap

2017-06-20 Thread guangrong . xiao
From: Xiao Guangrong It is used to track possible writable sptes on the shadow page on which the bit is set to 1 for the sptes that are already writable or can be locklessly updated to writable on the fast_page_fault path, also a counter for the number of possible writable sptes is introduced to

[Qemu-devel] [PATCH v2 3/7] KVM: MMU: introduce kvm_mmu_write_protect_all_pages

2017-06-20 Thread guangrong . xiao
From: Xiao Guangrong The original idea is from Avi. kvm_mmu_write_protect_all_pages() is extremely fast to write protect all the guest memory. Comparing with the ordinary algorithm which write protects last level sptes based on the rmap one by one, it just simply updates the generation number to

[Qemu-devel] [PATCH v2 1/7] KVM: MMU: correct the behavior of mmu_spte_update_no_track

2017-06-20 Thread guangrong . xiao
From: Xiao Guangrong Current behavior of mmu_spte_update_no_track() does not match the name of _no_track() as actually the A/D bits are tracked and returned to the caller This patch introduces the real _no_track() function to update the spte regardless of A/D bits and rename the original functio

[Qemu-devel] [PATCH v2 6/7] KVM: MMU: clarify fast_pf_fix_direct_spte

2017-06-20 Thread guangrong . xiao
From: Xiao Guangrong The writable spte can not be locklessly fixed and add a WARN_ON() to trigger the warning if something out of our mind happens, that is good for us to track if the log for writable spte is missed on the fast path Signed-off-by: Xiao Guangrong --- arch/x86/kvm/mmu.c | 11 +++

[Qemu-devel] [PATCH v2 5/7] KVM: MMU: allow dirty log without write protect

2017-06-20 Thread guangrong . xiao
From: Xiao Guangrong A new flag, KVM_DIRTY_LOG_WITHOUT_WRITE_PROTECT, is introduced which indicates the userspace just wants to get the snapshot of dirty bitmap During live migration, after all snapshot of dirty bitmap is fetched from KVM, the guest memory can be write protected by calling KVM_W

[Qemu-devel] [PATCH v2 0/7] KVM: MMU: fast write protect

2017-06-20 Thread guangrong . xiao
From: Xiao Guangrong Changelog in v2: thanks to Paolo's review, this version disables write-protect-all if PML is supported Background == The original idea of this patchset is from Avi who raised it in the mailing list during my vMMU development some years ago This patchset introduces a

[Qemu-devel] [PATCH v2 4/7] KVM: MMU: enable KVM_WRITE_PROTECT_ALL_MEM

2017-06-20 Thread guangrong . xiao
From: Xiao Guangrong The functionality of write protection for all guest memory is ready, it is the time to make its usable for userspace which is indicated by KVM_CAP_X86_WRITE_PROTECT_ALL_MEM Signed-off-by: Xiao Guangrong --- arch/x86/kvm/x86.c | 21 + include/uapi/

[Qemu-devel] [PATCH v2 7/7] KVM: MMU: stop using mmu_spte_get_lockless under mmu-lock

2017-06-20 Thread guangrong . xiao
From: Xiao Guangrong mmu_spte_age() is under the protection of mmu-lock, no reason to use mmu_spte_get_lockless() Signed-off-by: Xiao Guangrong --- arch/x86/kvm/mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 7711953..dc00

[Qemu-devel] [PATCH 0/5] mc146818rtc: fix Windows VM clock faster

2017-04-12 Thread guangrong . xiao
From: Xiao Guangrong We noticed that the clock on some windows VMs, e.g, Window7 and window8 is really faster and the issue can be easily reproduced by staring the VM with '-rtc base=localtime,clock=vm,driftfix=slew -no-hpet' and running attached code in the guest The root cause is that the clo

[Qemu-devel] [PATCH 3/5] mc146818rtc: properly count the time for the next interrupt

2017-04-12 Thread guangrong . xiao
From: Tai Yunfang If periodic_timer_update() is called due to RegA reconfiguration, i.e, the period is updated, current time is not the start point for the next periodic timer, instead, which should start from the last interrupt, otherwise, the clock in VM will become slow This patch takes the c

[Qemu-devel] [PATCH 1/5] mc146818rtc: update periodic timer only if it is needed

2017-04-12 Thread guangrong . xiao
From: Xiao Guangrong Currently, the timer is updated whenever RegA or RegB is written even if the periodic timer related configuration is not changed This patch optimizes it slightly to make the update happen only if its period or enable-status is changed, also later patches are depend on this o

[Qemu-devel] [PATCH 4/5] mc146818rtc: move x86 specific code out of periodic_timer_update

2017-04-12 Thread guangrong . xiao
From: Xiao Guangrong Move the x86 specific code in periodic_timer_update() to a common place, the actual logic is not changed Signed-off-by: Xiao Guangrong --- hw/timer/mc146818rtc.c | 112 + 1 file changed, 66 insertions(+), 46 deletions(-) dif

[Qemu-devel] [PATCH 5/5] mc146818rtc: embrace all x86 specific code

2017-04-12 Thread guangrong . xiao
From: Xiao Guangrong This patch introduces proper hooks in the common code then moves x86 specific code into a single '#ifdef TARGET_I386' The real logic is not touched Signed-off-by: Xiao Guangrong --- hw/timer/mc146818rtc.c | 197 ++--- 1 file cha

[Qemu-devel] [PATCH 2/5] mc146818rtc: fix clock lost after scaling coalesced irq

2017-04-12 Thread guangrong . xiao
From: Xiao Guangrong If the period is changed by re-configuring RegA, the coalesced irq will be scaled to reflect the new period, however, it calculates the new interrupt number like this: s->irq_coalesced = (s->irq_coalesced * s->period) / period; There are some clocks will be lost if they

[Qemu-devel] [PATCH v3 1/5] mc146818rtc: update periodic timer only if it is needed

2017-05-10 Thread guangrong . xiao
From: Xiao Guangrong Currently, the timer is updated whenever RegA or RegB is written even if the periodic timer related configuration is not changed This patch optimizes it slightly to make the update happen only if its period or enable-status is changed, also later patches are depend on this o

[Qemu-devel] [PATCH v3 2/5] mc146818rtc: precisely count the clock for periodic timer

2017-05-10 Thread guangrong . xiao
From: Tai Yunfang There are two issues in current code: 1) If the period is changed by re-configuring RegA, the coalesced irq will be scaled to reflect the new period, however, it calculates the new interrupt number like this: s->irq_coalesced = (s->irq_coalesced * s->period) / period;

[Qemu-devel] [PATCH v3 3/5] mc146818rtc: ensure LOST_TICK_POLICY_SLEW is only enabled on TARGET_I386

2017-05-10 Thread guangrong . xiao
From: Xiao Guangrong Any tick policy specified on other platforms rather on TARGET_I386 will fall back to LOST_TICK_POLICY_DISCARD silently, this patch makes sure only TARGET_I386 can enable LOST_TICK_POLICY_SLEW After that, we can enable LOST_TICK_POLICY_SLEW in the common code which need not u

[Qemu-devel] [PATCH v3 4/5] mc146818rtc: drop unnecessary '#ifdef TARGET_I386'

2017-05-10 Thread guangrong . xiao
From: Xiao Guangrong If the code purely depends on LOST_TICK_POLICY_SLEW, we can simply drop '#ifdef TARGET_I386' as only x86 can enable this tick policy Signed-off-by: Xiao Guangrong --- hw/timer/mc146818rtc.c | 16 +++- 1 file changed, 3 insertions(+), 13 deletions(-) diff --git

[Qemu-devel] [PATCH v3 0/5] mc146818rtc: fix Windows VM clock faster

2017-05-10 Thread guangrong . xiao
From: Xiao Guangrong Changelog in v3: Thanks to Paolo's the elaborate review comments, this version simplifies the logic of periodic_timer_update() significantly that includes: 1) introduce rtc_periodic_clock_ticks() that takes both regA and regB into account and returns the period clock 2) co

[Qemu-devel] [PATCH v3 5/5] mc146818rtc: embrace all x86 specific code

2017-05-10 Thread guangrong . xiao
From: Xiao Guangrong Introduce a function, rtc_policy_slew_deliver_irq(), which delivers irq if LOST_TICK_POLICY_SLEW is used, as which is only supported on x86, other platforms call it will trigger a assert After that, we can move the x86 specific code to the common place Signed-off-by: Xiao G

[Qemu-devel] [PATCH] qtest: add rtc periodic timer test

2017-05-24 Thread guangrong . xiao
From: Xiao Guangrong It tests the accuracy of rtc periodic timer which is recently improved & fixed by: mc146818rtc: precisely count the clock for periodic timer (commit id has not been decided yet) Note: as qemu needs a precise timer to drive its rtc timer callbacks, that means clock=vm i

[Qemu-devel] [PATCH v2] [PATCH] qtest: add rtc periodic timer test

2017-05-26 Thread guangrong . xiao
From: Xiao Guangrong It tests the accuracy of rtc periodic timer which is recently improved & fixed by: mc146818rtc: precisely count the clock for periodic timer (commit id has not been decided yet) Changelog in v2: integrate it with rtc-test by using clock_step_next() to inquire the t

[Qemu-devel] [PATCH 2/2] migration: introduce pages-per-second

2018-12-12 Thread guangrong . xiao
From: Xiao Guangrong It introduces a new statistic, pages-per-second, as bandwidth or mbps is not enough to measure the performance of posting pages out as we have compression, xbzrle, which can significantly reduce the amount of the data size, instead, pages-per-second if the one we want Signed

[Qemu-devel] [PATCH 0/2] optimize waiting for free thread to do compression

2018-12-12 Thread guangrong . xiao
From: Xiao Guangrong Currently we have two behaviors if all threads are busy to do compression, the main thread mush wait one of them becoming free if @compress-wait-thread set to on or the main thread can directly return without wait and post the page out as normal one Both of them have its pro

[Qemu-devel] [PATCH 1/2] migration: introduce compress-wait-thread-adaptive

2018-12-13 Thread guangrong . xiao
From: Xiao Guangrong Currently we have two behaviors if all threads are busy to do compression, the main thread mush wait one of them becoming free if @compress-wait-thread set to on or the main thread can directly return without wait and post the page out as normal one Both of them have its pro

[Qemu-devel] [PATCH v2 0/5] migration: improve multithreads

2018-11-06 Thread guangrong . xiao
From: Xiao Guangrong Changelog in v2: These changes are based on Paolo's suggestion: 1) rename the lockless multithreads model to threaded workqueue 2) hugely improve the internal design, that make all the request be a large array, properly partition it, assign requests to threads respectiv

[Qemu-devel] [PATCH v2 2/5] util: introduce threaded workqueue

2018-11-06 Thread guangrong . xiao
From: Xiao Guangrong This modules implements the lockless and efficient threaded workqueue. Three abstracted objects are used in this module: - Request. It not only contains the data that the workqueue fetches out to finish the request but also offers the space to save the result af

[Qemu-devel] [PATCH v2 5/5] tests: add threaded-workqueue-bench

2018-11-06 Thread guangrong . xiao
From: Xiao Guangrong It's the benhcmark of threaded-workqueue, also it's a good example to show how threaded-workqueue is used Signed-off-by: Xiao Guangrong --- tests/Makefile.include | 5 +- tests/threaded-workqueue-bench.c | 256 +++ 2 files ch

[Qemu-devel] [PATCH v2 4/5] migration: use threaded workqueue for decompression

2018-11-06 Thread guangrong . xiao
From: Xiao Guangrong Adapt the compression code to the threaded workqueue Signed-off-by: Xiao Guangrong --- migration/ram.c | 225 1 file changed, 81 insertions(+), 144 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index a

[Qemu-devel] [PATCH v2 1/5] bitops: introduce change_bit_atomic

2018-11-06 Thread guangrong . xiao
From: Xiao Guangrong It will be used by threaded workqueue Signed-off-by: Xiao Guangrong --- include/qemu/bitops.h | 13 + 1 file changed, 13 insertions(+) diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h index 3f0926cf40..c522958852 100644 --- a/include/qemu/bitops.h ++

[Qemu-devel] [PATCH v2 3/5] migration: use threaded workqueue for compression

2018-11-06 Thread guangrong . xiao
From: Xiao Guangrong Adapt the compression code to the threaded workqueue Signed-off-by: Xiao Guangrong --- migration/ram.c | 313 +--- 1 file changed, 115 insertions(+), 198 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index

[Qemu-devel] [PATCH 3/4] migration: use lockless Multithread model for compression

2018-10-16 Thread guangrong . xiao
From: Xiao Guangrong Adapt the compression code to the lockless multithread model Signed-off-by: Xiao Guangrong --- migration/ram.c | 312 +--- 1 file changed, 115 insertions(+), 197 deletions(-) diff --git a/migration/ram.c b/migration/ram.

[Qemu-devel] [PATCH 0/4] migration: improve multithreads

2018-10-16 Thread guangrong . xiao
From: Xiao Guangrong This is the last part of our previous work: https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg00526.html This part finally improves the multithreads model used by compression and decompression, that makes the compression feature is really usable in the production.

[Qemu-devel] [PATCH 1/4] ptr_ring: port ptr_ring from linux kernel to QEMU

2018-10-16 Thread guangrong . xiao
From: Xiao Guangrong ptr_ring is good to minimize cache-contention and has the simple model of memory barrier which will be used by lockless threads model to pass requests between main migration thread and compression threads Some changes are made: 1) drop unnecessary APIs, e.g, for _irq, _bh AP

[Qemu-devel] [PATCH 4/4] migration: use lockless Multithread model for decompression

2018-10-16 Thread guangrong . xiao
From: Xiao Guangrong Adapt the compression code to the lockless multithread model Signed-off-by: Xiao Guangrong --- migration/ram.c | 223 1 file changed, 78 insertions(+), 145 deletions(-) diff --git a/migration/ram.c b/migration/ram.c

[Qemu-devel] [PATCH 2/4] migration: introduce lockless multithreads model

2018-10-16 Thread guangrong . xiao
From: Xiao Guangrong Current implementation of compression and decompression are very hard to be enabled on productions. We noticed that too many wait-wakes go to kernel space and CPU usages are very low even if the system is really free The reasons are: 1) there are two many locks used to do sy

[Qemu-devel] [PATCH v3 0/5] migration: improve multithreads

2018-11-21 Thread guangrong . xiao
From: Xiao Guangrong Changelog in v3: Thanks to Emilio's comments and his example code, the changes in this version are: 1. move @requests from the shared data struct to each single thread 2. move completion ev from the shared data struct to each single thread 3. move bitmaps from the shared data

[Qemu-devel] [PATCH v3 1/5] bitops: introduce change_bit_atomic

2018-11-21 Thread guangrong . xiao
From: Xiao Guangrong It will be used by threaded workqueue Signed-off-by: Xiao Guangrong --- include/qemu/bitops.h | 13 + 1 file changed, 13 insertions(+) diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h index 3f0926cf40..c522958852 100644 --- a/include/qemu/bitops.h ++

[Qemu-devel] [PATCH v3 2/5] util: introduce threaded workqueue

2018-11-21 Thread guangrong . xiao
From: Xiao Guangrong This modules implements the lockless and efficient threaded workqueue. Three abstracted objects are used in this module: - Request. It not only contains the data that the workqueue fetches out to finish the request but also offers the space to save the result af

[Qemu-devel] [PATCH v3 4/5] migration: use threaded workqueue for decompression

2018-11-21 Thread guangrong . xiao
From: Xiao Guangrong Adapt the compression code to the threaded workqueue Signed-off-by: Xiao Guangrong --- migration/ram.c | 222 1 file changed, 77 insertions(+), 145 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 2

[Qemu-devel] [PATCH v3 3/5] migration: use threaded workqueue for compression

2018-11-21 Thread guangrong . xiao
From: Xiao Guangrong Adapt the compression code to the threaded workqueue Signed-off-by: Xiao Guangrong --- migration/ram.c | 308 1 file changed, 110 insertions(+), 198 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index

[Qemu-devel] [PATCH v3 5/5] tests: add threaded-workqueue-bench

2018-11-21 Thread guangrong . xiao
From: Xiao Guangrong It's the benhcmark of threaded-workqueue, also it's a good example to show how threaded-workqueue is used Signed-off-by: Xiao Guangrong --- tests/Makefile.include | 5 +- tests/threaded-workqueue-bench.c | 255 +++ 2 files ch

[Qemu-devel] [PATCH v2 0/3] optimize waiting for free thread to do compression

2019-01-10 Thread guangrong . xiao
From: Xiao Guangrong Changelog in v2: squash 'compress-wait-thread-adaptive' into 'compress-wait-thread' based on peter's suggestion Currently we have two behaviors if all threads are busy to do compression, the main thread mush wait one of them becoming free if @compress-wait-thread set to on

[Qemu-devel] [PATCH v2 2/3] migration: fix memory leak when updating tls-creds and tls-hostname

2019-01-10 Thread guangrong . xiao
From: Xiao Guangrong If we update parameter, tls-creds and tls-hostname, these string values are duplicated to local variables in migrate_params_test_apply() by using g_strdup(), however these new allocated memory are missed to be freed Actually, they are not used to check anything, we can direc

[Qemu-devel] [PATCH v2 1/3] migration: introduce pages-per-second

2019-01-10 Thread guangrong . xiao
From: Xiao Guangrong It introduces a new statistic, pages-per-second, as bandwidth or mbps is not enough to measure the performance of posting pages out as we have compression, xbzrle, which can significantly reduce the amount of the data size, instead, pages-per-second is the one we want Signed

[Qemu-devel] [PATCH v2 3/3] migration: introduce adaptive model for waiting thread

2019-01-10 Thread guangrong . xiao
From: Xiao Guangrong Currently we have two behaviors if all threads are busy to do compression, the main thread mush wait one of them becoming free if @compress-wait-thread set to on or the main thread can directly return without wait and post the page out as normal one Both of them have its pro

[Qemu-devel] [PATCH v5 0/4] migration: compression optimization

2018-09-03 Thread guangrong . xiao
From: Xiao Guangrong Changelog in v5: use the way in the older version to handle flush_compressed_data in the iteration, i.e, introduce dirty_sync_count and flush compressed data if the count is changed. That's because we should post the data after QEMU_VM_SECTION_PART has been posted

[Qemu-devel] [PATCH v5 1/4] migration: do not flush_compressed_data at the end of each iteration

2018-09-03 Thread guangrong . xiao
From: Xiao Guangrong flush_compressed_data() needs to wait all compression threads to finish their work, after that all threads are free until the migration feeds new request to them, reducing its call can improve the throughput and use CPU resource more effectively We do not need to flush all t

[Qemu-devel] [PATCH v5 2/4] migration: fix calculating xbzrle_counters.cache_miss_rate

2018-09-03 Thread guangrong . xiao
From: Xiao Guangrong As Peter pointed out: | - xbzrle_counters.cache_miss is done in save_xbzrle_page(), so it's | per-guest-page granularity | | - RAMState.iterations is done for each ram_find_and_save_block(), so | it's per-host-page granularity | | An example is that when we migrate a 2M h

[Qemu-devel] [PATCH v5 4/4] migration: handle the error condition properly

2018-09-03 Thread guangrong . xiao
From: Xiao Guangrong ram_find_and_save_block() can return negative if any error hanppens, however, it is completely ignored in current code Signed-off-by: Xiao Guangrong --- migration/ram.c | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/migration/ram.c

[Qemu-devel] [PATCH v5 3/4] migration: show the statistics of compression

2018-09-03 Thread guangrong . xiao
From: Xiao Guangrong Currently, it includes: pages: amount of pages compressed and transferred to the target VM busy: amount of count that no free thread to compress data busy-rate: rate of thread busy compressed-size: amount of bytes after compression compression-rate: rate of compressed size R

[Qemu-devel] [PATCH v6 0/3] migration: compression optimization

2018-09-06 Thread guangrong . xiao
From: Xiao Guangrong Changelog in v6: Thanks to Juan's review, in this version we 1) move flush compressed data to find_dirty_block() where it hits the end of memblock 2) use save_page_use_compression instead of migrate_use_compression in flush_compressed_data Xiao Guangrong (3): migrat

[Qemu-devel] [PATCH v6 1/3] migration: do not flush_compressed_data at the end of iteration

2018-09-06 Thread guangrong . xiao
From: Xiao Guangrong flush_compressed_data() needs to wait all compression threads to finish their work, after that all threads are free until the migration feeds new request to them, reducing its call can improve the throughput and use CPU resource more effectively We do not need to flush all t

[Qemu-devel] [PATCH v6 2/3] migration: show the statistics of compression

2018-09-06 Thread guangrong . xiao
From: Xiao Guangrong Currently, it includes: pages: amount of pages compressed and transferred to the target VM busy: amount of count that no free thread to compress data busy-rate: rate of thread busy compressed-size: amount of bytes after compression compression-rate: rate of compressed size R

[Qemu-devel] [PATCH v6 3/3] migration: use save_page_use_compression in flush_compressed_data

2018-09-06 Thread guangrong . xiao
From: Xiao Guangrong It avoids to touch compression locks if xbzrle and compression are both enabled Signed-off-by: Xiao Guangrong --- migration/ram.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/migration/ram.c b/migration/ram.c index 65a563993d..747dd9208b 100644 --

[Qemu-devel] [PATCH v2 0/8] migration: compression optimization

2018-07-19 Thread guangrong . xiao
From: Xiao Guangrong Thanks to Peter's suggestion, i split the long series (1) and this is the first part. I am not sure if Dave is happy to @reduced-size, will change immediately if it's objected. :) Changelog in v2: 1) introduce a parameter to make the main thread wait for free thread thre

[Qemu-devel] [PATCH v2 2/8] migration: fix counting normal page for compression

2018-07-19 Thread guangrong . xiao
From: Xiao Guangrong The compressed page is not normal page Signed-off-by: Xiao Guangrong --- migration/ram.c | 1 - 1 file changed, 1 deletion(-) diff --git a/migration/ram.c b/migration/ram.c index 0ad234c692..1b016e048d 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1903,7 +1903,6

[Qemu-devel] [PATCH v2 3/8] migration: show the statistics of compression

2018-07-19 Thread guangrong . xiao
From: Xiao Guangrong Currently, it includes: pages: amount of pages compressed and transferred to the target VM busy: amount of count that no free thread to compress data busy-rate: rate of thread busy reduced-size: amount of bytes reduced by compression compression-rate: rate of compressed size

[Qemu-devel] [PATCH v2 1/8] migration: do not wait for free thread

2018-07-19 Thread guangrong . xiao
From: Xiao Guangrong Instead of putting the main thread to sleep state to wait for free compression thread, we can directly post it out as normal page that reduces the latency and uses CPUs more efficiently A parameter, compress-wait-thread, is introduced, it can be enabled if the user really wa

[Qemu-devel] [PATCH v2 4/8] migration: introduce save_zero_page_to_file

2018-07-19 Thread guangrong . xiao
From: Xiao Guangrong It will be used by the compression threads Signed-off-by: Xiao Guangrong --- migration/ram.c | 40 ++-- 1 file changed, 30 insertions(+), 10 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index e68b0e6dec..ce6e69b649 100644

[Qemu-devel] [PATCH v2 5/8] migration: drop the return value of do_compress_ram_page

2018-07-19 Thread guangrong . xiao
From: Xiao Guangrong It is not used and cleans the code up a little Signed-off-by: Xiao Guangrong --- migration/ram.c | 26 +++--- 1 file changed, 11 insertions(+), 15 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index ce6e69b649..5aa624b3b9 100644 --- a/mig

[Qemu-devel] [PATCH v2 8/8] migration: do not flush_compressed_data at the end of each iteration

2018-07-19 Thread guangrong . xiao
From: Xiao Guangrong flush_compressed_data() needs to wait all compression threads to finish their work, after that all threads are free until the migration feeds new request to them, reducing its call can improve the throughput and use CPU resource more effectively We do not need to flush all t

[Qemu-devel] [PATCH v2 6/8] migration: move handle of zero page to the thread

2018-07-19 Thread guangrong . xiao
From: Xiao Guangrong Detecting zero page is not a light work, moving it to the thread to speed the main thread up Signed-off-by: Xiao Guangrong --- migration/ram.c | 112 +++- 1 file changed, 78 insertions(+), 34 deletions(-) diff --git a/mi

[Qemu-devel] [PATCH v2 7/8] migration: hold the lock only if it is really needed

2018-07-19 Thread guangrong . xiao
From: Xiao Guangrong Try to hold src_page_req_mutex only if the queue is not empty Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Xiao Guangrong --- include/qemu/queue.h | 1 + migration/ram.c | 4 2 files changed, 5 insertions(+) diff --git a/include/qemu/queue.h b/include/qem

[Qemu-devel] [PATCH] migration: introduce decompress-error-check

2018-04-26 Thread guangrong . xiao
From: Xiao Guangrong QEMU 2.13 enables strict check for compression & decompression to make the migration more robuster, that depends on the source to fix the internal design which triggers the unexpected error conditions To make it work for migrating old version QEMU to 2.13 QEMU, we introduce

[Qemu-devel] [PATCH] migration: fix saving normal page even if it's been compressed

2018-04-28 Thread guangrong . xiao
From: Xiao Guangrong Fix the bug introduced by da3f56cb2e767016 (migration: remove ram_save_compressed_page()), It should be 'return' rather than 'res' Sorry for this stupid mistake :( Signed-off-by: Xiao Guangrong --- migration/ram.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) dif

[Qemu-devel] [PATCH 00/12] migration: improve multithreads for compression and decompression

2018-06-04 Thread guangrong . xiao
From: Xiao Guangrong Background -- Current implementation of compression and decompression are very hard to be enabled on productions. We noticed that too many wait-wakes go to kernel space and CPU usages are very low even if the system is really free The reasons are: 1) there are two ma

[Qemu-devel] [PATCH 03/12] migration: fix counting xbzrle cache_miss_rate

2018-06-04 Thread guangrong . xiao
From: Xiao Guangrong Sync up xbzrle_cache_miss_prev only after migration iteration goes forward Signed-off-by: Xiao Guangrong --- migration/ram.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/migration/ram.c b/migration/ram.c index dbf24d8c87..dd1283dd45 100644 --- a/migr

[Qemu-devel] [PATCH 02/12] migration: fix counting normal page for compression

2018-06-04 Thread guangrong . xiao
From: Xiao Guangrong The compressed page is not normal page Signed-off-by: Xiao Guangrong --- migration/ram.c | 1 - 1 file changed, 1 deletion(-) diff --git a/migration/ram.c b/migration/ram.c index 0caf32ab0a..dbf24d8c87 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1432,7 +1432,6

[Qemu-devel] [PATCH 05/12] migration: show the statistics of compression

2018-06-04 Thread guangrong . xiao
From: Xiao Guangrong Then the uses can adjust the parameters based on this info Currently, it includes: pages: amount of pages compressed and transferred to the target VM busy: amount of count that no free thread to compress data busy-rate: rate of thread busy reduced-size: amount of bytes reduc

[Qemu-devel] [PATCH 06/12] migration: do not detect zero page for compression

2018-06-04 Thread guangrong . xiao
From: Xiao Guangrong Detecting zero page is not a light work, we can disable it for compression that can handle all zero data very well Signed-off-by: Xiao Guangrong --- migration/ram.c | 44 +++- 1 file changed, 23 insertions(+), 21 deletions(-) diff -

[Qemu-devel] [PATCH 04/12] migration: introduce migration_update_rates

2018-06-04 Thread guangrong . xiao
From: Xiao Guangrong It is used to slightly clean the code up, no logic is changed Signed-off-by: Xiao Guangrong --- migration/ram.c | 35 ++- 1 file changed, 22 insertions(+), 13 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index dd1283dd45..ee0

[Qemu-devel] [PATCH 01/12] migration: do not wait if no free thread

2018-06-04 Thread guangrong . xiao
From: Xiao Guangrong Instead of putting the main thread to sleep state to wait for free compression thread, we can directly post it out as normal page that reduces the latency and uses CPUs more efficiently Signed-off-by: Xiao Guangrong --- migration/ram.c | 34 +++-

[Qemu-devel] [PATCH 09/12] ring: introduce lockless ring buffer

2018-06-04 Thread guangrong . xiao
From: Xiao Guangrong It's the simple lockless ring buffer implement which supports both single producer vs. single consumer and multiple producers vs. single consumer. Many lessons were learned from Linux Kernel's kfifo (1) and DPDK's rte_ring (2) before i wrote this implement. It corrects some

[Qemu-devel] [PATCH 07/12] migration: hold the lock only if it is really needed

2018-06-04 Thread guangrong . xiao
From: Xiao Guangrong Try to hold src_page_req_mutex only if the queue is not empty Signed-off-by: Xiao Guangrong --- include/qemu/queue.h | 1 + migration/ram.c | 4 2 files changed, 5 insertions(+) diff --git a/include/qemu/queue.h b/include/qemu/queue.h index 59fd1203a1..ac418efc4

[Qemu-devel] [PATCH 08/12] migration: do not flush_compressed_data at the end of each iteration

2018-06-04 Thread guangrong . xiao
From: Xiao Guangrong flush_compressed_data() needs to wait all compression threads to finish their work, after that all threads are free until the migration feed new request to them, reducing its call can improve the throughput and use CPU resource more effectively We do not need to flush all th

[Qemu-devel] [PATCH 11/12] migration: use lockless Multithread model for compression

2018-06-04 Thread guangrong . xiao
From: Xiao Guangrong Adapt the compression code to the lockless multithread model Signed-off-by: Xiao Guangrong --- migration/ram.c | 412 ++-- 1 file changed, 161 insertions(+), 251 deletions(-) diff --git a/migration/ram.c b/migration/ram.

[Qemu-devel] [PATCH 10/12] migration: introduce lockless multithreads model

2018-06-04 Thread guangrong . xiao
From: Xiao Guangrong Current implementation of compression and decompression are very hard to be enabled on productions. We noticed that too many wait-wakes go to kernel space and CPU usages are very low even if the system is really free The reasons are: 1) there are two many locks used to do sy

[Qemu-devel] [PATCH 12/12] migration: use lockless Multithread model for decompression

2018-06-04 Thread guangrong . xiao
From: Xiao Guangrong Adapt the compression code to the lockless multithread model Signed-off-by: Xiao Guangrong --- migration/ram.c | 381 ++-- 1 file changed, 175 insertions(+), 206 deletions(-) diff --git a/migration/ram.c b/migration/ram.

[Qemu-devel] [PATCH v3 00/10] migration: compression optimization

2018-08-07 Thread guangrong . xiao
From: Xiao Guangrong Changelog in v3: Thanks to Peter's comments, the changes in this version are: 1) make compress-wait-thread be true on default to keep current behavior 2) save the compressed-size instead of reduced size and fix calculating compression ratio 3) fix calculating xbzrle_count

[Qemu-devel] [PATCH v3 01/10] migration: do not wait for free thread

2018-08-07 Thread guangrong . xiao
From: Xiao Guangrong Instead of putting the main thread to sleep state to wait for free compression thread, we can directly post it out as normal page that reduces the latency and uses CPUs more efficiently A parameter, compress-wait-thread, is introduced, it can be enabled if the user really wa

[Qemu-devel] [PATCH v3 07/10] migration: do not flush_compressed_data at the end of each iteration

2018-08-07 Thread guangrong . xiao
From: Xiao Guangrong flush_compressed_data() needs to wait all compression threads to finish their work, after that all threads are free until the migration feeds new request to them, reducing its call can improve the throughput and use CPU resource more effectively We do not need to flush all t

[Qemu-devel] [PATCH v3 05/10] migration: move handle of zero page to the thread

2018-08-07 Thread guangrong . xiao
From: Xiao Guangrong Detecting zero page is not a light work, moving it to the thread to speed the main thread up, btw, handling ram_release_pages() for the zero page is moved to the thread as well Signed-off-by: Xiao Guangrong --- migration/ram.c | 96 +

[Qemu-devel] [PATCH v3 03/10] migration: introduce save_zero_page_to_file

2018-08-07 Thread guangrong . xiao
From: Xiao Guangrong It will be used by the compression threads Reviewed-by: Peter Xu Signed-off-by: Xiao Guangrong --- migration/ram.c | 40 ++-- 1 file changed, 30 insertions(+), 10 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index d631b9

[Qemu-devel] [PATCH v3 02/10] migration: fix counting normal page for compression

2018-08-07 Thread guangrong . xiao
From: Xiao Guangrong The compressed page is not normal page Reviewed-by: Peter Xu Signed-off-by: Xiao Guangrong --- migration/ram.c | 1 - 1 file changed, 1 deletion(-) diff --git a/migration/ram.c b/migration/ram.c index ae9e83c2b6..d631b9a6fe 100644 --- a/migration/ram.c +++ b/migration/ra

[Qemu-devel] [PATCH v3 04/10] migration: drop the return value of do_compress_ram_page

2018-08-07 Thread guangrong . xiao
From: Xiao Guangrong It is not used and cleans the code up a little Reviewed-by: Peter Xu Signed-off-by: Xiao Guangrong --- migration/ram.c | 26 +++--- 1 file changed, 11 insertions(+), 15 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 49ace30614..e463

[Qemu-devel] [PATCH v3 09/10] migration: fix calculating xbzrle_counters.cache_miss_rate

2018-08-07 Thread guangrong . xiao
From: Xiao Guangrong As Peter pointed out: | - xbzrle_counters.cache_miss is done in save_xbzrle_page(), so it's | per-guest-page granularity | | - RAMState.iterations is done for each ram_find_and_save_block(), so | it's per-host-page granularity | | An example is that when we migrate a 2M h

[Qemu-devel] [PATCH v3 10/10] migration: show the statistics of compression

2018-08-07 Thread guangrong . xiao
From: Xiao Guangrong Currently, it includes: pages: amount of pages compressed and transferred to the target VM busy: amount of count that no free thread to compress data busy-rate: rate of thread busy compressed-size: amount of bytes after compression compression-rate: rate of compressed size S

[Qemu-devel] [PATCH v3 06/10] migration: hold the lock only if it is really needed

2018-08-07 Thread guangrong . xiao
From: Xiao Guangrong Try to hold src_page_req_mutex only if the queue is not empty Reviewed-by: Dr. David Alan Gilbert Reviewed-by: Peter Xu Signed-off-by: Xiao Guangrong --- include/qemu/queue.h | 1 + migration/ram.c | 4 2 files changed, 5 insertions(+) diff --git a/include/qem

[Qemu-devel] [PATCH v3 08/10] migration: handle the error condition properly

2018-08-07 Thread guangrong . xiao
From: Xiao Guangrong ram_find_and_save_block() can return negative if any error hanppens, however, it is completely ignored in current code Signed-off-by: Xiao Guangrong --- migration/ram.c | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/migration/ram.c

[Qemu-devel] [PATCH v4 00/10] migration: compression optimization

2018-08-21 Thread guangrong . xiao
From: Xiao Guangrong Changelog in v4: These changes are based on the suggestion from Peter Eric. 1) improve qapi's grammar 2) move calling flush_compressed_data to migration_bitmap_sync() 3) rename 'handle_pages' to 'target_page_count' Note: there is still no clear way to fix handling the error

[Qemu-devel] [PATCH v4 02/10] migration: fix counting normal page for compression

2018-08-21 Thread guangrong . xiao
From: Xiao Guangrong The compressed page is not normal page Reviewed-by: Peter Xu Signed-off-by: Xiao Guangrong --- migration/ram.c | 1 - 1 file changed, 1 deletion(-) diff --git a/migration/ram.c b/migration/ram.c index ae9e83c2b6..d631b9a6fe 100644 --- a/migration/ram.c +++ b/migration/ra

[Qemu-devel] [PATCH v4 04/10] migration: drop the return value of do_compress_ram_page

2018-08-21 Thread guangrong . xiao
From: Xiao Guangrong It is not used and cleans the code up a little Reviewed-by: Peter Xu Signed-off-by: Xiao Guangrong --- migration/ram.c | 26 +++--- 1 file changed, 11 insertions(+), 15 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 49ace30614..e463

[Qemu-devel] [PATCH v4 01/10] migration: do not wait for free thread

2018-08-21 Thread guangrong . xiao
From: Xiao Guangrong Instead of putting the main thread to sleep state to wait for free compression thread, we can directly post it out as normal page that reduces the latency and uses CPUs more efficiently A parameter, compress-wait-thread, is introduced, it can be enabled if the user really wa

[Qemu-devel] [PATCH v4 07/10] migration: do not flush_compressed_data at the end of each iteration

2018-08-21 Thread guangrong . xiao
From: Xiao Guangrong flush_compressed_data() needs to wait all compression threads to finish their work, after that all threads are free until the migration feeds new request to them, reducing its call can improve the throughput and use CPU resource more effectively We do not need to flush all t

[Qemu-devel] [PATCH v4 03/10] migration: introduce save_zero_page_to_file

2018-08-21 Thread guangrong . xiao
From: Xiao Guangrong It will be used by the compression threads Reviewed-by: Peter Xu Signed-off-by: Xiao Guangrong --- migration/ram.c | 40 ++-- 1 file changed, 30 insertions(+), 10 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index d631b9

[Qemu-devel] [PATCH v4 06/10] migration: hold the lock only if it is really needed

2018-08-21 Thread guangrong . xiao
From: Xiao Guangrong Try to hold src_page_req_mutex only if the queue is not empty Reviewed-by: Dr. David Alan Gilbert Reviewed-by: Peter Xu Signed-off-by: Xiao Guangrong --- include/qemu/queue.h | 1 + migration/ram.c | 4 2 files changed, 5 insertions(+) diff --git a/include/qem

[Qemu-devel] [PATCH v4 08/10] migration: fix calculating xbzrle_counters.cache_miss_rate

2018-08-21 Thread guangrong . xiao
From: Xiao Guangrong As Peter pointed out: | - xbzrle_counters.cache_miss is done in save_xbzrle_page(), so it's | per-guest-page granularity | | - RAMState.iterations is done for each ram_find_and_save_block(), so | it's per-host-page granularity | | An example is that when we migrate a 2M h

[Qemu-devel] [PATCH v4 05/10] migration: move handle of zero page to the thread

2018-08-21 Thread guangrong . xiao
From: Xiao Guangrong Detecting zero page is not a light work, moving it to the thread to speed the main thread up, btw, handling ram_release_pages() for the zero page is moved to the thread as well Reviewed-by: Peter Xu Signed-off-by: Xiao Guangrong --- migration/ram.c | 96 ++

[Qemu-devel] [PATCH v4 09/10] migration: show the statistics of compression

2018-08-21 Thread guangrong . xiao
From: Xiao Guangrong Currently, it includes: pages: amount of pages compressed and transferred to the target VM busy: amount of count that no free thread to compress data busy-rate: rate of thread busy compressed-size: amount of bytes after compression compression-rate: rate of compressed size R

  1   2   >