> -----Original Message----- > From: Dr. David Alan Gilbert [mailto:dgilb...@redhat.com] > Sent: Wednesday, February 19, 2014 6:54 PM > To: Gonglei (Arei) > Cc: qemu-devel@nongnu.org; Peter Maydell; chenliang (T); Juan Quintela; > Luonengjun; owass...@redhat.com; Huangweidong (Hardware) > Subject: Re: [Qemu-devel] [PATCH] XBZRLE: Fix qemu crash when resize the > xbzrle cache during migration > > * Gonglei (Arei) (arei.gong...@huawei.com) wrote: > > Hi Arei, > > > It is likely to crash qemu when resize the xbzrle cache > > during migration. Because the xbzrle cache will be modified > > by migration thread and resize thread. > > Thanks - we was just thinking about this last night after > we hit it. > > I was thinking about doing it by just moving the resize > into the ram save loop; but I think your lock looks about right. > > > Test scene > > step one: set the size of xbzrle cache 1GB. > > step two: migrate vm which dirty memory continuously. > > step three: set the size of xbzrle cache 0.125GB. > > > > Signed-off-by: ChenLiang <chenlian...@huawei.com > > > Signed-off-by: Gonglei <arei.gong...@huawei.com > > > --- > > arch_init.c | 42 > ++++++++++++++++++++++++++++++++++++--- > > include/migration/page_cache.h | 14 +++++++++++++ > > page_cache.c | 13 ++++++++++++ > > 3 files changed, 65 insertions(+), 4 deletions(-) > > > > diff --git a/arch_init.c b/arch_init.c > > index 80574a0..e2d2c72 100644 > > --- a/arch_init.c > > +++ b/arch_init.c > > @@ -164,26 +164,56 @@ static struct { > > uint8_t *encoded_buf; > > /* buffer for storing page content */ > > uint8_t *current_buf; > > - /* Cache for XBZRLE */ > > + /* Cache for XBZRLE, Protected by lock. */ > > PageCache *cache; > > + QemuMutex lock; > > } XBZRLE = { > > .encoded_buf = NULL, > > .current_buf = NULL, > > .cache = NULL, > > }; > > + > > /* buffer used for XBZRLE decoding */ > > static uint8_t *xbzrle_decoded_buf; > > > > +static void XBZRLE_cache_lock(void) > > +{ > > + qemu_mutex_lock(&XBZRLE.lock); > > +} > > + > > +static void XBZRLE_cache_unlock(void) > > +{ > > + qemu_mutex_unlock(&XBZRLE.lock); > > +} > > + > > + > > int64_t xbzrle_cache_resize(int64_t new_size) > > { > > + PageCache *new_cache, *old_cache; > > + > > if (new_size < TARGET_PAGE_SIZE) { > > return -1; > > } > > > > if (XBZRLE.cache != NULL) { > > - return cache_resize(XBZRLE.cache, new_size / > TARGET_PAGE_SIZE) * > > - TARGET_PAGE_SIZE; > > + if (pow2floor(new_size) == cache_max_num_items(XBZRLE.cache) > > + > *TARGET_PAGE_SIZE) { > > + goto ret; > > + } > > + new_cache = cache_init(new_size / TARGET_PAGE_SIZE, > > + > cache_page_size(XBZRLE.cache)); > > I wonder whether it's safe to even check the size of the XBZRLE.cache here; > it's a short race, but I think if you're unlucky and migration completes > between the NULL check and here then it would break. > Also 'migration_end' calls cache_fini, so it probably also needs to have > the lock around it.
Thanks, I will think about that, and commit a new patch to handle the issue that you refer to. > > > + if (!new_cache) { > > + DPRINTF("Error creating cache\n"); > > + return -1; > > + } > > + XBZRLE_cache_lock(); > > + old_cache = XBZRLE.cache; > > + XBZRLE.cache = new_cache; > > + XBZRLE_cache_unlock(); > > + cache_fini(old_cache); > > + goto ret; > > } > > +ret: > > return pow2floor(new_size); > > } > > > > @@ -522,6 +552,8 @@ static int ram_save_block(QEMUFile *f, bool > last_stage) > > ret = ram_control_save_page(f, block->offset, > > offset, TARGET_PAGE_SIZE, > &bytes_sent); > > > > + XBZRLE_cache_lock(); > > + > > if (ret != RAM_SAVE_CONTROL_NOT_SUPP) { > > if (ret != RAM_SAVE_CONTROL_DELAYED) { > > if (bytes_sent > 0) { > > @@ -553,6 +585,8 @@ static int ram_save_block(QEMUFile *f, bool > last_stage) > > acct_info.norm_pages++; > > } > > > > + XBZRLE_cache_unlock(); > > + > > /* if page is unmodified, continue to the next */ > > if (bytes_sent > 0) { > > last_sent_block = block; > > @@ -681,7 +715,7 @@ static int ram_save_setup(QEMUFile *f, void > *opaque) > > XBZRLE.encoded_buf = NULL; > > return -1; > > } > > - > > + qemu_mutex_init(&XBZRLE.lock); > > acct_clear(); > > } > > > > diff --git a/include/migration/page_cache.h > b/include/migration/page_cache.h > > index d156f0d..6be0103 100644 > > --- a/include/migration/page_cache.h > > +++ b/include/migration/page_cache.h > > @@ -79,4 +79,18 @@ int cache_insert(PageCache *cache, uint64_t addr, > uint8_t *pdata); > > */ > > int64_t cache_resize(PageCache *cache, int64_t num_pages); > > > > +/** > > + * cache_max_num_items: return the cache max num items. > > + * > > + * @cache pointer to the PageCache struct > > + */ > > +int64_t cache_max_num_items(PageCache *cache); > > + > > +/** > > + * cache_page_size: return the cache page size > > + * > > + * @cache pointer to the PageCache struct > > + */ > > +unsigned int cache_page_size(PageCache *cache); > > + > > #endif > > diff --git a/page_cache.c b/page_cache.c > > index 3ef6ee7..92e401c 100644 > > --- a/page_cache.c > > +++ b/page_cache.c > > @@ -234,3 +234,16 @@ int64_t cache_resize(PageCache *cache, int64_t > new_num_pages) > > > > return cache->max_num_items; > > } > > + > > +int64_t cache_max_num_items(PageCache *cache) > > +{ > > + g_assert(cache); > > + return cache->max_num_items; > > +} > > + > > +unsigned int cache_page_size(PageCache *cache) > > +{ > > + g_assert(cache); > > + return cache->page_size; > > +} > > + > > -- > > 1.6.0.2 > > > > > > Best regards, > > -Gonglei > > > -- > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK