On Tue, Oct 1, 2024 at 11:28 PM Peter Xu <pet...@redhat.com> wrote:
> On Tue, Oct 01, 2024 at 10:02:53AM +0800, Yong Huang wrote: > > On Tue, Oct 1, 2024 at 4:41 AM Peter Xu <pet...@redhat.com> wrote: > > > > > On Mon, Sep 30, 2024 at 01:14:26AM +0800, yong.hu...@smartx.com wrote: > > > > From: Hyman Huang <yong.hu...@smartx.com> > > > > > > > > When VM is configured with huge memory, the current throttle logic > > > > doesn't look like to scale, because migration_trigger_throttle() > > > > is only called for each iteration, so it won't be invoked for a long > > > > time if one iteration can take a long time. > > > > > > > > The background dirty sync aim to fix the above issue by synchronizing > > > > the ramblock from remote dirty bitmap and, when necessary, triggering > > > > the CPU throttle multiple times during a long iteration. > > > > > > > > This is a trade-off between synchronization overhead and CPU throttle > > > > impact. > > > > > > > > Signed-off-by: Hyman Huang <yong.hu...@smartx.com> > > > > --- > > > > include/migration/misc.h | 3 ++ > > > > migration/migration.c | 11 +++++++ > > > > migration/ram.c | 64 > ++++++++++++++++++++++++++++++++++++ > > > > migration/ram.h | 3 ++ > > > > migration/trace-events | 1 + > > > > system/cpu-timers.c | 2 ++ > > > > tests/qtest/migration-test.c | 29 ++++++++++++++++ > > > > 7 files changed, 113 insertions(+) > > > > > > > > diff --git a/include/migration/misc.h b/include/migration/misc.h > > > > index bfadc5613b..67c00d98f5 100644 > > > > --- a/include/migration/misc.h > > > > +++ b/include/migration/misc.h > > > > @@ -111,4 +111,7 @@ bool migration_in_bg_snapshot(void); > > > > /* migration/block-dirty-bitmap.c */ > > > > void dirty_bitmap_mig_init(void); > > > > > > > > +/* migration/ram.c */ > > > > +void bg_ram_dirty_sync_init(void); > > > > > > IMO it's better we don't add this logic to ram.c as I mentioned. It's > > > closely relevant to auto converge so I think cpu-throttle.c is more > > > suitable. > > > > > > > > + > > > > #endif > > > > diff --git a/migration/migration.c b/migration/migration.c > > > > index 3dea06d577..224b5dfb4f 100644 > > > > --- a/migration/migration.c > > > > +++ b/migration/migration.c > > > > @@ -3285,6 +3285,9 @@ static void > > > migration_iteration_finish(MigrationState *s) > > > > { > > > > /* If we enabled cpu throttling for auto-converge, turn it off. > */ > > > > cpu_throttle_stop(); > > > > + if (migrate_auto_converge()) { > > > > + bg_ram_dirty_sync_timer_enable(false); > > > > + } > > > > > > Please avoid adding this code. When it's part of auto-converge, > > > cpu_throttle_stop() should already guarantee that timer disabled > > > altogether. > > > > > > > > > > > > bql_lock(); > > > > switch (s->state) { > > > > @@ -3526,6 +3529,14 @@ static void *migration_thread(void *opaque) > > > > > > > > trace_migration_thread_setup_complete(); > > > > > > > > + /* > > > > + * Tick the background ramblock dirty sync timer after setup > > > > + * phase. > > > > + */ > > > > + if (migrate_auto_converge()) { > > > > + bg_ram_dirty_sync_timer_enable(true); > > > > + } > > > > > > Might be good to still stick the enablement with auto-converge; the > latter > > > was done in migration_trigger_throttle(). Maybe also enable the timer > only > > > there? > > > > > > > Ok. > > > > > > > > + > > > > while (migration_is_active()) { > > > > if (urgent || !migration_rate_exceeded(s->to_dst_file)) { > > > > MigIterateState iter_state = migration_iteration_run(s); > > > > diff --git a/migration/ram.c b/migration/ram.c > > > > index 67ca3d5d51..995bae1ac9 100644 > > > > --- a/migration/ram.c > > > > +++ b/migration/ram.c > > > > @@ -110,6 +110,12 @@ > > > > */ > > > > #define MAPPED_RAM_LOAD_BUF_SIZE 0x100000 > > > > > > > > +/* Background ramblock dirty sync trigger every five seconds */ > > > > +#define BG_RAM_SYNC_TIMESLICE_MS 5000 > > > > +#define BG_RAM_SYNC_TIMER_INTERVAL_MS 1000 > > > > > > Why this timer needs to be invoked every 1sec, if it's a 5sec timer? > > > > > > > The logic is stupid :(, I'll fix that. > > > > > > > > + > > > > +static QEMUTimer *bg_ram_dirty_sync_timer; > > > > + > > > > XBZRLECacheStats xbzrle_counters; > > > > > > > > /* used by the search for pages to send */ > > > > @@ -4543,6 +4549,64 @@ static void > > > ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host, > > > > } > > > > } > > > > > > > > +static void bg_ram_dirty_sync_timer_tick(void *opaque) > > > > > > Please consider moving this function to cpu-throttle.c. > > > > > > > Yes, I got your idea, but, IMHO, the cpu-throttle.c only implements the > CPU > > throttle, not the ramblock dirty sync, the migration_bitmap_sync_precopy > > could or should not be called in the cpu-throttle.c.Do we want to change > > this code level? > > > > Another way is we define the bg_ram_dirty_sync_timer in cpu-throttle.c > > and modify it in bg_ram_dirty_sync_timer_tick as a extern variable in > ram.c > > I prefer the latter, What do you think of it? > > I think it's better all logic resides in cpu-throttle.c. > > You can have one pre-requisite patch remove "rs" parameter in > migration_bitmap_sync_precopy(), then export it in migration/misc.h. > > Maybe you also need to export time_last_bitmap_sync, you can make another > helper for that and also put it in misc.h too. > > Said that all, I wonder whether we should move cpu-throttle.c under > migration/, as that's only used in migration.. then we can avoid exporting > in misc.h, but export them in migration.h (which is for internal only). > > > > > > > > > > Also please prefix the functions with cpu_throttle_*(), rather than > bg_*(). > > > It should be part of auto converge function. > > > > > > > +{ > > > > + static int prev_pct; > > > > + static uint64_t prev_sync_cnt = 2; > > > > + uint64_t sync_cnt = stat64_get(&mig_stats.dirty_sync_count); > > > > + int cur_pct = cpu_throttle_get_percentage(); > > > > + > > > > + if (prev_pct && !cur_pct) { > > > > + /* CPU throttle timer stopped, so do we */ > > > > + return; > > > > + } > > > > > > Didn't follow why this is not needed if cpu throttle is 0. > > > > > > > The sequence in my head is: > > > > bg dirty sync -> mig_throttle_guest_down -> throttle -> throttle stop-> > bg > > dirty sync stop > > > > The bg dirty sync may be invoked before throttle, and > > the throttle is also 0 at that time, if we code like: > > > > if (!cpu_throttle_get_percentage()) { > > return; > > } > > > > The bg dirty sync timer can tick only once and stop. > > > > If we change the sequence as following, we can ignore this case: > > > > mig_throttle_guest_down -> bg dirty sync -> throttle -> throttle stop-> > bg > > dirty sync stop > > > > But as the code in migration_trigger_throttle indicate: > > > > if ((bytes_dirty_period > bytes_dirty_threshold) && > > ++rs->dirty_rate_high_cnt >= 2) { > > ... > > } > > > > Since the 1st iteration can not satisfy the condition > rs->dirty_rate_high_cnt > > >= 2 > > as usual, the mig_throttle_guest_down gets invoked in 3nd iteration with > > high > > probability. If the 2nd iteration is very long, the bg dirty sync can not > > be invoked and > > we may lose the chance to trigger CPU throttle as I mentioned. > > I wonder whether it's working if we simply put: > > if (!migrate_auto_converge()) { > return; > } > > I think this timer should kick even if !cpu_throttle_active(), becuase > !active doesn't mean the feature is off. In this case the feature is > enabled as long as migrate_auto_converge()==true. > > This addes another reason that maybe we want to move system/cpu-throttle.c > under migration/.. because otherwise we'll need to export > migrate_auto_converge() once more. > > > > > > > > It means dirty rate is probably very low, but then shouldn't this > > > background sync still working (just to notice it grows again)? > > > > > > > + > > > > + /* > > > > + * The first iteration copies all memory anyhow and has no > > > > + * effect on guest performance, therefore omit it to avoid > > > > + * paying extra for the sync penalty. > > > > + */ > > > > + if (sync_cnt <= 1) { > > > > + goto end; > > > > + } > > > > + > > > > + if (sync_cnt == prev_sync_cnt) { > > > > + int64_t curr_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); > > > > + assert(ram_state); > > > > + if ((curr_time - ram_state->time_last_bitmap_sync) > > > > > + BG_RAM_SYNC_TIMESLICE_MS) { > > > > + trace_bg_ram_dirty_sync(); > > > > + WITH_RCU_READ_LOCK_GUARD() { > > > > + migration_bitmap_sync_precopy(ram_state, false); > > > > + } > > > > + } > > > > + } > > > > + > > > > +end: > > > > + prev_sync_cnt = stat64_get(&mig_stats.dirty_sync_count); > > > > + prev_pct = cpu_throttle_get_percentage(); > > > > + > > > > + timer_mod(bg_ram_dirty_sync_timer, > > > > + qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL_RT) + > > > > + BG_RAM_SYNC_TIMER_INTERVAL_MS); > > > > > > IIUC we only need to fire per 5sec, not 1s? > > > > > > > Thanks to point out, I'll refine this logic. > > > > > > > > > > > +} > > > > + > > > > +void bg_ram_dirty_sync_timer_enable(bool enable) > > > > +{ > > > > + if (enable) { > > > > + bg_ram_dirty_sync_timer_tick(NULL); > > > > + } else { > > > > + timer_del(bg_ram_dirty_sync_timer); > > > > + } > > > > +} > > > > + > > > > +void bg_ram_dirty_sync_init(void) > > > > +{ > > > > + bg_ram_dirty_sync_timer = > > > > + timer_new_ms(QEMU_CLOCK_VIRTUAL_RT, > > > > + bg_ram_dirty_sync_timer_tick, NULL); > > > > +} > > > > > > IMHO all these functions should move to cpu-throttle.c. > > > > > > > + > > > > static RAMBlockNotifier ram_mig_ram_notifier = { > > > > .ram_block_resized = ram_mig_ram_block_resized, > > > > }; > > > > diff --git a/migration/ram.h b/migration/ram.h > > > > index bc0318b834..9c1a2f30f1 100644 > > > > --- a/migration/ram.h > > > > +++ b/migration/ram.h > > > > @@ -93,4 +93,7 @@ void ram_write_tracking_prepare(void); > > > > int ram_write_tracking_start(void); > > > > void ram_write_tracking_stop(void); > > > > > > > > +/* Background ramblock dirty sync */ > > > > +void bg_ram_dirty_sync_timer_enable(bool enable); > > > > + > > > > #endif > > > > diff --git a/migration/trace-events b/migration/trace-events > > > > index c65902f042..3f09e7f383 100644 > > > > --- a/migration/trace-events > > > > +++ b/migration/trace-events > > > > @@ -90,6 +90,7 @@ put_qlist_end(const char *field_name, const char > > > *vmsd_name) "%s(%s)" > > > > qemu_file_fclose(void) "" > > > > > > > > # ram.c > > > > +bg_ram_dirty_sync(void) "" > > > > get_queued_page(const char *block_name, uint64_t tmp_offset, > unsigned > > > long page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx" > > > > get_queued_page_not_dirty(const char *block_name, uint64_t > tmp_offset, > > > unsigned long page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx" > > > > migration_bitmap_sync_start(void) "" > > > > diff --git a/system/cpu-timers.c b/system/cpu-timers.c > > > > index 0b31c9a1b6..64f0834be4 100644 > > > > --- a/system/cpu-timers.c > > > > +++ b/system/cpu-timers.c > > > > @@ -25,6 +25,7 @@ > > > > #include "qemu/osdep.h" > > > > #include "qemu/cutils.h" > > > > #include "migration/vmstate.h" > > > > +#include "migration/misc.h" > > > > #include "qapi/error.h" > > > > #include "qemu/error-report.h" > > > > #include "sysemu/cpus.h" > > > > @@ -274,4 +275,5 @@ void cpu_timers_init(void) > > > > vmstate_register(NULL, 0, &vmstate_timers, &timers_state); > > > > > > > > cpu_throttle_init(); > > > > + bg_ram_dirty_sync_init(); > > > > } > > > > diff --git a/tests/qtest/migration-test.c > b/tests/qtest/migration-test.c > > > > index d6768d5d71..3296f5244d 100644 > > > > --- a/tests/qtest/migration-test.c > > > > +++ b/tests/qtest/migration-test.c > > > > @@ -468,6 +468,12 @@ static void migrate_ensure_converge(QTestState > *who) > > > > migrate_set_parameter_int(who, "downtime-limit", 30 * 1000); > > > > } > > > > > > > > +static void migrate_ensure_iteration_last_long(QTestState *who) > > > > +{ > > > > + /* Set 10Byte/s bandwidth limit to make the iteration last long > > > enough */ > > > > + migrate_set_parameter_int(who, "max-bandwidth", 10); > > > > +} > > > > + > > > > /* > > > > * Our goal is to ensure that we run a single full migration > > > > * iteration, and also dirty memory, ensuring that at least > > > > @@ -2791,6 +2797,7 @@ static void test_migrate_auto_converge(void) > > > > * so we need to decrease a bandwidth. > > > > */ > > > > const int64_t init_pct = 5, inc_pct = 25, max_pct = 95; > > > > + uint64_t prev_dirty_sync_cnt, dirty_sync_cnt; > > > > > > > > if (test_migrate_start(&from, &to, uri, &args)) { > > > > return; > > > > @@ -2827,6 +2834,28 @@ static void test_migrate_auto_converge(void) > > > > } while (true); > > > > /* The first percentage of throttling should be at least > init_pct */ > > > > g_assert_cmpint(percentage, >=, init_pct); > > > > + > > > > + /* Make sure the iteration last a long time enough */ > > > > + migrate_ensure_iteration_last_long(from); > > > > > > There's already migrate_ensure_non_converge(), why this is needed? > > > > > > > I'm feeling that migrate_ensure_non_converge cannot ensure the > > iteration lasts greater than 5s, so i and an extra util function. > > non_converge sets it to 3MB/s, while all qtest mem should be >=100MB on all > archs, it looks ok as of now. Maybe add a comment would suffice? > > It's not extremely bad to even miss one here in unit test, if we target > this feature as pretty much optional on top of auto converge. If you want > to provide a solid test, you can add a stat counter and check it here with > query-migrate. But again it'll be better move cpu-throttle.c over first, > or it requires another export in misc.h.. > > > > > > > > > + > > > > + /* > > > > + * End the loop when the dirty sync count greater than 1. > > > > + */ > > > > + while ((dirty_sync_cnt = get_migration_pass(from)) < 2) { > > > > + usleep(1000 * 1000); > > > > + } > > > > + > > > > + prev_dirty_sync_cnt = dirty_sync_cnt; > > > > + > > > > + /* > > > > + * The dirty sync count must changes in 5 seconds, here we > > > > + * plus 1 second as error value. > > > > + */ > > > > + sleep(5 + 1); > > > > + > > > > + dirty_sync_cnt = get_migration_pass(from); > > > > + g_assert_cmpint(dirty_sync_cnt, != , prev_dirty_sync_cnt); > > > > + > > > > /* Now, when we tested that throttling works, let it converge */ > > > > migrate_ensure_converge(from); > > > > > > Please move the test change into a separate patch. I had a feeling > 5+1 sec > > > might still easily fail on CIs (even though this test is not yet run). > > > Maybe it needs to still provide a loop so it waits for a few rounds > just in > > > case. > > > > > > > OK. > > > > > > > > > > Thanks, > > > > > > -- > > > Peter Xu > > > > > > > > Thanks, > > Yong > > > > -- > > Best regards > > Sorry for the late reply, The whole suggestions above are OK for me and I'll try that in the next version. > -- > Peter Xu > > Thanks, Yong -- Best regards