Re: [Qemu-devel] [PATCH v3] XBZRLE: Fix qemu crash when resize the xbzrle cache

Juan Quintela Tue, 25 Feb 2014 07:26:31 -0800

"Gonglei (Arei)" <arei.gong...@huawei.com> wrote:
> Resizing the xbzrle cache during migration causes qemu-crash,
> because the main-thread and migration-thread modify the xbzrle
> cache size concurrently without lock-protection.
>
> Signed-off-by: ChenLiang <chenlian...@huawei.com>
> Signed-off-by: Gonglei <arei.gong...@huawei.com>
> Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com>


Sorry for the late review.

> ---
> Changes against the previous version:
>  *Remove the temporary variable cache in the xbzrle_cache_resize.
>
>  arch_init.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 55 insertions(+), 3 deletions(-)
>
> diff --git a/arch_init.c b/arch_init.c
> index 80574a0..003640f 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -164,26 +164,69 @@ static struct {
>      uint8_t *encoded_buf;
>      /* buffer for storing page content */
>      uint8_t *current_buf;
> -    /* Cache for XBZRLE */
> +    /* Cache for XBZRLE, Protected by lock. */
>      PageCache *cache;
> +    QemuMutex lock;
>  } XBZRLE = {
>      .encoded_buf = NULL,
>      .current_buf = NULL,
>      .cache = NULL,
> +    /* use the lock carefully */
> +    .lock = {PTHREAD_MUTEX_INITIALIZER},
>  };

Have you tried to compile for windows?  I think this would make it
break.  We need to init it with qemu_mutex_init() in ram_save_setup()
after the call to cache_init()?



>  /* buffer used for XBZRLE decoding */
>  static uint8_t *xbzrle_decoded_buf;
>  
> +static void XBZRLE_cache_lock(void)
> +{
> +    qemu_mutex_lock(&XBZRLE.lock);
> +}
> +

if we change this to:

    if (migrate_use_xbzrle())
        qemu_mutex_lock(&XBZRLE.lock);

On both cache operations, we would remove the overhead in the no XBRLE
case, right?

> +static void XBZRLE_cache_unlock(void)
> +{
> +    qemu_mutex_unlock(&XBZRLE.lock);
> +}
> +
>  int64_t xbzrle_cache_resize(int64_t new_size)
>  {
> +    PageCache *new_cache, *old_cache;
> +
>      if (new_size < TARGET_PAGE_SIZE) {
>          return -1;
>      }
>  
> +    XBZRLE_cache_lock();
>      if (XBZRLE.cache != NULL) {
> -        return cache_resize(XBZRLE.cache, new_size / TARGET_PAGE_SIZE) *
> -            TARGET_PAGE_SIZE;
> +        /* check XBZRLE.cache again later */
> +        XBZRLE_cache_unlock();
> +        if (pow2floor(new_size) == migrate_xbzrle_cache_size()) {
> +            return pow2floor(new_size);
> +        }
> +        new_cache = cache_init(new_size / TARGET_PAGE_SIZE,
> +                                        TARGET_PAGE_SIZE);
> +        if (!new_cache) {
> +            DPRINTF("Error creating cache\n");
> +            return -1;
> +        }
> +
> +        XBZRLE_cache_lock();
> +        /* the XBZRLE.cache may have be destroyed, check it again */
> +        if (XBZRLE.cache != NULL) {
> +            old_cache = XBZRLE.cache;
> +            XBZRLE.cache = new_cache;
> +            new_cache = NULL;
> +        }
> +        XBZRLE_cache_unlock();
> +
> +        if (NULL == new_cache) {
> +            cache_fini(old_cache);
> +        } else {
> +            cache_fini(new_cache);
> +        }
> +    } else {
> +        XBZRLE_cache_unlock();
>      }

Can we change this to:

        if (pow2floor(new_size) == migrate_xbzrle_cache_size()) {
            return pow2floor(new_size);
        }

        new_cache = cache_init(new_size / TARGET_PAGE_SIZE,
                                        TARGET_PAGE_SIZE);
        if (!new_cache) {
            DPRINTF("Error creating cache\n");
            return -1;
        }

        XBZRLE_cache_lock();
        /* the XBZRLE.cache may have be destroyed, check it again */
        if (XBZRLE.cache != NULL) {
            cache_to_free = XBZRLE.cache;
            XBZRLE.cache = new_cache;
        } else {
            cache_to_free = new_cache;
        }
        XBZRLE_cache_unlock();

        cache_fini(cache_to_free);

I think that looking is clearer this way.  BTW, we are losing
_everything_ that is on the cache if we resize it.  I don't know if that
is what we want.  If we have a big cache, and make it bigger because we
are having too many misses, just dropping the whole cache looks like a
bad idea :-(


And my understanding is that we should do a:

migrate_get_current()->xbzrle_cache_size = new_size;

independently of what CACHE is equal or different from NULL?
(this was already wrong before your patch, just asking because you are
looking at the code).

Thanks, Juan.

Re: [Qemu-devel] [PATCH v3] XBZRLE: Fix qemu crash when resize the xbzrle cache

Reply via email to