"Gonglei (Arei)" <arei.gong...@huawei.com> wrote: > Resizing the xbzrle cache during migration causes qemu-crash, > because the main-thread and migration-thread modify the xbzrle > cache size concurrently without lock-protection. > > Signed-off-by: ChenLiang <chenlian...@huawei.com> > Signed-off-by: Gonglei <arei.gong...@huawei.com> > Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
Sorry for the late review. > --- > Changes against the previous version: > *Remove the temporary variable cache in the xbzrle_cache_resize. > > arch_init.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 55 insertions(+), 3 deletions(-) > > diff --git a/arch_init.c b/arch_init.c > index 80574a0..003640f 100644 > --- a/arch_init.c > +++ b/arch_init.c > @@ -164,26 +164,69 @@ static struct { > uint8_t *encoded_buf; > /* buffer for storing page content */ > uint8_t *current_buf; > - /* Cache for XBZRLE */ > + /* Cache for XBZRLE, Protected by lock. */ > PageCache *cache; > + QemuMutex lock; > } XBZRLE = { > .encoded_buf = NULL, > .current_buf = NULL, > .cache = NULL, > + /* use the lock carefully */ > + .lock = {PTHREAD_MUTEX_INITIALIZER}, > }; Have you tried to compile for windows? I think this would make it break. We need to init it with qemu_mutex_init() in ram_save_setup() after the call to cache_init()? > /* buffer used for XBZRLE decoding */ > static uint8_t *xbzrle_decoded_buf; > > +static void XBZRLE_cache_lock(void) > +{ > + qemu_mutex_lock(&XBZRLE.lock); > +} > + if we change this to: if (migrate_use_xbzrle()) qemu_mutex_lock(&XBZRLE.lock); On both cache operations, we would remove the overhead in the no XBRLE case, right? > +static void XBZRLE_cache_unlock(void) > +{ > + qemu_mutex_unlock(&XBZRLE.lock); > +} > + > int64_t xbzrle_cache_resize(int64_t new_size) > { > + PageCache *new_cache, *old_cache; > + > if (new_size < TARGET_PAGE_SIZE) { > return -1; > } > > + XBZRLE_cache_lock(); > if (XBZRLE.cache != NULL) { > - return cache_resize(XBZRLE.cache, new_size / TARGET_PAGE_SIZE) * > - TARGET_PAGE_SIZE; > + /* check XBZRLE.cache again later */ > + XBZRLE_cache_unlock(); > + if (pow2floor(new_size) == migrate_xbzrle_cache_size()) { > + return pow2floor(new_size); > + } > + new_cache = cache_init(new_size / TARGET_PAGE_SIZE, > + TARGET_PAGE_SIZE); > + if (!new_cache) { > + DPRINTF("Error creating cache\n"); > + return -1; > + } > + > + XBZRLE_cache_lock(); > + /* the XBZRLE.cache may have be destroyed, check it again */ > + if (XBZRLE.cache != NULL) { > + old_cache = XBZRLE.cache; > + XBZRLE.cache = new_cache; > + new_cache = NULL; > + } > + XBZRLE_cache_unlock(); > + > + if (NULL == new_cache) { > + cache_fini(old_cache); > + } else { > + cache_fini(new_cache); > + } > + } else { > + XBZRLE_cache_unlock(); > } Can we change this to: if (pow2floor(new_size) == migrate_xbzrle_cache_size()) { return pow2floor(new_size); } new_cache = cache_init(new_size / TARGET_PAGE_SIZE, TARGET_PAGE_SIZE); if (!new_cache) { DPRINTF("Error creating cache\n"); return -1; } XBZRLE_cache_lock(); /* the XBZRLE.cache may have be destroyed, check it again */ if (XBZRLE.cache != NULL) { cache_to_free = XBZRLE.cache; XBZRLE.cache = new_cache; } else { cache_to_free = new_cache; } XBZRLE_cache_unlock(); cache_fini(cache_to_free); I think that looking is clearer this way. BTW, we are losing _everything_ that is on the cache if we resize it. I don't know if that is what we want. If we have a big cache, and make it bigger because we are having too many misses, just dropping the whole cache looks like a bad idea :-( And my understanding is that we should do a: migrate_get_current()->xbzrle_cache_size = new_size; independently of what CACHE is equal or different from NULL? (this was already wrong before your patch, just asking because you are looking at the code). Thanks, Juan.