On Thu, Feb 22, 2024 at 6:33 PM Peter Xu <pet...@redhat.com> wrote: > > On Wed, Feb 21, 2024 at 06:06:19PM -0300, Fabiano Rosas wrote: > > Hao Xiang <hao.xi...@bytedance.com> writes: > > > > > This change adds a dedicated handler for > > > MigrationOps::ram_save_target_page in > > > > nit: Add a dedicated handler... > > > > Usually "this patch/change" is used only when necessary to avoid > > ambiguity. > > > > > multifd live migration. Now zero page checking can be done in the multifd > > > threads > > > and this becomes the default configuration. We still provide backward > > > compatibility > > > where zero page checking is done from the migration main thread. > > > > > > Signed-off-by: Hao Xiang <hao.xi...@bytedance.com> > > > --- > > > migration/multifd.c | 1 + > > > migration/options.c | 2 +- > > > migration/ram.c | 53 ++++++++++++++++++++++++++++++++++----------- > > > 3 files changed, 42 insertions(+), 14 deletions(-) > > > > > > diff --git a/migration/multifd.c b/migration/multifd.c > > > index fbb40ea10b..ef5dad1019 100644 > > > --- a/migration/multifd.c > > > +++ b/migration/multifd.c > > > @@ -13,6 +13,7 @@ > > > #include "qemu/osdep.h" > > > #include "qemu/cutils.h" > > > > This include... > > > > > #include "qemu/rcu.h" > > > +#include "qemu/cutils.h" > > > > is there already. > > > > > #include "exec/target_page.h" > > > #include "sysemu/sysemu.h" > > > #include "exec/ramblock.h" > > > diff --git a/migration/options.c b/migration/options.c > > > index 3c603391b0..3c79b6ccd4 100644 > > > --- a/migration/options.c > > > +++ b/migration/options.c > > > @@ -181,7 +181,7 @@ Property migration_properties[] = { > > > MIG_MODE_NORMAL), > > > DEFINE_PROP_ZERO_PAGE_DETECTION("zero-page-detection", > > > MigrationState, > > > parameters.zero_page_detection, > > > - ZERO_PAGE_DETECTION_LEGACY), > > > + ZERO_PAGE_DETECTION_MULTIFD), > > > > I think we'll need something to avoid a 9.0 -> 8.2 migration with this > > enabled. Otherwise it will go along happily until we get data corruption > > because the new QEMU didn't send any zero pages on the migration thread > > and the old QEMU did not look for them in the multifd packet. > > It could be even worse, as the new QEMU will only attach "normal" pages > after the multifd packet, the old QEMU could read more than it could, > expecting all pages.. > > > > > Perhaps bumping the MULTIFD_VERSION when ZERO_PAGE_DETECTION_MULTIFD is > > in use. We'd just need to fix the test in the new QEMU to check > > (msg.version > MULTIFD_VERSION) instead of (msg.version != MULTIFD_VERSION). > > IMHO we don't need yet to change MULTIFD_VERSION, what we need is perhaps a > compat entry in hw_compat_8_2 setting "zero-page-detection" to "legacy". > We should make sure when "legacy" is set, multifd ran the old protocol > (zero_num will always be 0, and will be ignored by old QEMUs, IIUC). > > One more comment is, when repost please consider split this patch into two; > The new ram_save_target_page_multifd() hook can be done in another patch, > AFAIU.
Sorry, I kept missing this. I will keep telling myself, compatibility is king. I will set the hw_compat_8_2 setting and make sure to test migration 9.0 -> 8.2 fails with "multifd" option set. Will split patches. > > > > > > > > > /* Migration capabilities */ > > > DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE), > > > diff --git a/migration/ram.c b/migration/ram.c > > > index 5ece9f042e..b088c5a98c 100644 > > > --- a/migration/ram.c > > > +++ b/migration/ram.c > > > @@ -1123,10 +1123,6 @@ static int save_zero_page(RAMState *rs, > > > PageSearchStatus *pss, > > > QEMUFile *file = pss->pss_channel; > > > int len = 0; > > > > > > - if (migrate_zero_page_detection() != ZERO_PAGE_DETECTION_LEGACY) { > > > - return 0; > > > - } > > > > How does 'none' work now? > > > > > - > > > if (!buffer_is_zero(p, TARGET_PAGE_SIZE)) { > > > return 0; > > > } > > > @@ -1256,6 +1252,10 @@ static int ram_save_page(RAMState *rs, > > > PageSearchStatus *pss) > > > > > > static int ram_save_multifd_page(RAMBlock *block, ram_addr_t offset) > > > { > > > + assert(migrate_multifd()); > > > + assert(!migrate_compress()); > > > + assert(!migration_in_postcopy()); > > > > Drop these, please. Keep only the asserts that are likely to trigger > > during development, such as the existing ones at multifd_send_pages. > > > > > + > > > if (!multifd_queue_page(block, offset)) { > > > return -1; > > > } > > > @@ -2046,7 +2046,6 @@ static bool save_compress_page(RAMState *rs, > > > PageSearchStatus *pss, > > > */ > > > static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus > > > *pss) > > > { > > > - RAMBlock *block = pss->block; > > > ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS; > > > int res; > > > > > > @@ -2062,17 +2061,40 @@ static int ram_save_target_page_legacy(RAMState > > > *rs, PageSearchStatus *pss) > > > return 1; > > > } > > > > > > + return ram_save_page(rs, pss); > > > > Look at where git put this! Are you using the default diff algorithm? If > > not try using --patience to see if it improves the diff. > > > > > +} > > > + > > > +/** > > > + * ram_save_target_page_multifd: save one target page > > > + * > > > + * Returns the number of pages written > > > > We could be more precise here: > > > > ram_save_target_page_multifd: send one target page to multifd workers > > > > Returns 1 if the page was queued, -1 otherwise. > > > > > + * > > > + * @rs: current RAM state > > > + * @pss: data about the page we want to send > > > + */ > > > +static int ram_save_target_page_multifd(RAMState *rs, PageSearchStatus > > > *pss) > > > +{ > > > + RAMBlock *block = pss->block; > > > + ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS; > > > + > > > + /* Multifd is not compatible with old compression. */ > > > + assert(!migrate_compress()); > > > > This should already be enforced at options.c. > > > > > + > > > + /* Multifd is not compabible with postcopy. */ > > > + assert(!migration_in_postcopy()); > > > > Same here. > > > > > + > > > /* > > > - * Do not use multifd in postcopy as one whole host page should be > > > - * placed. Meanwhile postcopy requires atomic update of pages, so > > > even > > > - * if host page size == guest page size the dest guest during run may > > > - * still see partially copied pages which is data corruption. > > > + * Backward compatibility support. While using multifd live > > > + * migration, we still need to handle zero page checking on the > > > + * migration main thread. > > > */ > > > - if (migrate_multifd() && !migration_in_postcopy()) { > > > - return ram_save_multifd_page(block, offset); > > > + if (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) { > > > + if (save_zero_page(rs, pss, offset)) { > > > + return 1; > > > + } > > > } > > > > > > - return ram_save_page(rs, pss); > > > + return ram_save_multifd_page(block, offset); > > > } > > > > > > /* Should be called before sending a host page */ > > > @@ -2984,7 +3006,12 @@ static int ram_save_setup(QEMUFile *f, void > > > *opaque) > > > } > > > > > > migration_ops = g_malloc0(sizeof(MigrationOps)); > > > - migration_ops->ram_save_target_page = ram_save_target_page_legacy; > > > + > > > + if (migrate_multifd()) { > > > + migration_ops->ram_save_target_page = > > > ram_save_target_page_multifd; > > > + } else { > > > + migration_ops->ram_save_target_page = > > > ram_save_target_page_legacy; > > > + } > > > > > > bql_unlock(); > > > ret = multifd_send_sync_main(); > > > > -- > Peter Xu >