On Wed, Jun 14, 2023 at 1:50 AM Juan Quintela <quint...@redhat.com> wrote:
> ~hyman <hy...@git.sr.ht> wrote: > > From: Hyman Huang(黄勇) <yong.hu...@smartx.com> > > To speed thinkng up, 1-5 are included on next Migration PULL request. > OK, I'll post the next version only contain the last 3 commits. > > Implement dirty-limit convergence algo for live migration, > > which is kind of like auto-converge algo but using dirty-limit > > instead of cpu throttle to make migration convergent. > > > > Enable dirty page limit if dirty_rate_high_cnt greater than 2 > > when dirty-limit capability enabled, Disable dirty-limit if > > migration be cancled. > > Nit: canceled. get it. > > > > Note that "set_vcpu_dirty_limit", "cancel_vcpu_dirty_limit" > > commands are not allowed during dirty-limit live migration. > > > > Signed-off-by: Hyman Huang(黄勇) <yong.hu...@smartx.com> > > Signed-off-by: Markus Armbruster <arm...@redhat.com> > > > > + * Enable dirty-limit to throttle down the guest > > + */ > > +static void migration_dirty_limit_guest(void) > > +{ > > + static int64_t quota_dirtyrate; > > quota_dirtyrate deserves at least a comment. > > I guess it means the current quota_dirty_rate that is set, but no clue. OK. I'll comment it next version. > > > + MigrationState *s = migrate_get_current(); > > + > > + /* > > + * If dirty limit already enabled and migration parameter > > + * vcpu-dirty-limit untouched. > > + */ > > + if (dirtylimit_in_service() && > > + quota_dirtyrate == s->parameters.vcpu_dirty_limit) { > > + return; > > + } > > + > > + quota_dirtyrate = s->parameters.vcpu_dirty_limit; > > + > > + /* Set or update quota dirty limit */ > > + qmp_set_vcpu_dirty_limit(false, -1, quota_dirtyrate, NULL); > > Care to explain why do we have to "reset" the quota? Or why we can't > set it when the user issues the command, only when we throttle the guest? Indeed, -1 is misleading, the first parameter means the set all vcpu a quota dirtyrate, and the second parameter is meaningless if the first parameter is false. The comment will be like this next version? /* Set all vCPU a quota dirtyrate, note that the second parameter will be ignored if setting all vCPU for a vm. */ > > + trace_migration_dirty_limit_guest(quota_dirtyrate); > > +} > > + > > Split this patch in two: > > a - the logic change > b - the introduction of dirty limit. > > Ok, get it. > > Old code: > > /* During block migration the auto-converge logic incorrectly detects > * that ram migration makes no progress. Avoid this by disabling the > * throttling logic during the bulk phase of block migration. */ > if (blk_mig_bulk_active()) { > return; > } > > if (migrate_auto_converge()) { > /* The following detection logic can be refined later. For now: > Check to see if the ratio between dirtied bytes and the approx. > amount of bytes that just got transferred since the last time > we were in this routine reaches the threshold. If that happens > twice, start or increase throttling. */ > > if ((bytes_dirty_period > bytes_dirty_threshold) && > (++rs->dirty_rate_high_cnt >= 2)) { > trace_migration_throttle(); > rs->dirty_rate_high_cnt = 0; > mig_throttle_guest_down(bytes_dirty_period, > bytes_dirty_threshold); > } > } > > New code: > /* > * The following detection logic can be refined later. For now: > * Check to see if the ratio between dirtied bytes and the approx. > * amount of bytes that just got transferred since the last time > * we were in this routine reaches the threshold. If that happens > * twice, start or increase throttling. > */ > > if ((bytes_dirty_period > bytes_dirty_threshold) && > (++rs->dirty_rate_high_cnt >= 2)) { > rs->dirty_rate_high_cnt = 0; > /* > * During block migration the auto-converge logic incorrectly > detects > * that ram migration makes no progress. Avoid this by disabling > the > * throttling logic during the bulk phase of block migration > */ > if (blk_mig_bulk_active()) { > return; > } > > if (migrate_auto_converge()) { > trace_migration_throttle(); > mig_throttle_guest_down(bytes_dirty_period, > bytes_dirty_threshold); > } else if (migrate_dirty_limit()) { > migration_dirty_limit_guest(); > } > } > > Questions: > > - Why are we changing blk_mig_bulk_active() position? > I think that the old code have it in the right place. Additionally, > you just changefd to this version a couple of patches agon. > Yes, indeed, this modification make no sense, i'll fix it next version. > > > > > > int64_t cpu_index, > > Error **errp) > > { > > + MigrationState *ms = migrate_get_current(); > > + > > if (!kvm_enabled() || !kvm_dirty_ring_enabled()) { > > return; > > } > > @@ -453,6 +455,15 @@ void qmp_cancel_vcpu_dirty_limit(bool has_cpu_index, > > return; > > } > > > > + if (migration_is_running(ms->state) && > > + (!qemu_thread_is_self(&ms->thread)) && > > + migrate_dirty_limit() && > > + dirtylimit_in_service()) { > > + error_setg(errp, "can't cancel dirty page limit while" > > + " migration is running"); > > Error message is bad or wrong. > You can cancel the dirty page, you ust need to be on the main thread. > > Or I am missing something? > > Migration, IMHO, shares the same quota dirty rate stored in the global variable "dirtylimit_state ", if we cancel the dirty limit, it will make the throttle not work and the migration will be affected. > > > > + return; > > + } > > + > > dirtylimit_state_lock(); > > > > if (has_cpu_index) { > > @@ -488,6 +499,8 @@ void qmp_set_vcpu_dirty_limit(bool has_cpu_index, > > uint64_t dirty_rate, > > Error **errp) > > { > > + MigrationState *ms = migrate_get_current(); > > + > > if (!kvm_enabled() || !kvm_dirty_ring_enabled()) { > > error_setg(errp, "dirty page limit feature requires KVM with" > > " accelerator property 'dirty-ring-size' set'") > > @@ -504,6 +517,15 @@ void qmp_set_vcpu_dirty_limit(bool has_cpu_index, > > return; > > } > > > > + if (migration_is_running(ms->state) && > > + (!qemu_thread_is_self(&ms->thread)) && > > + migrate_dirty_limit() && > > + dirtylimit_in_service()) { > > + error_setg(errp, "can't cancel dirty page limit while" > > + " migration is running"); > > + return; > > + } > > If you use such a complex expression twice, I think that creating a > helper function is a good idea. > Ok, get it > > Later, Juan. > > Hyman -- Best regards