-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26263/
-----------------------------------------------------------

Review request for cloudstack, edison su and Min Chen.


Summary (updated)
-----------------

CLOUDSTACK-7539: [S3] Parallel deployment makes reference count of a cache in 
nfs secondary staging store negative(-1)


Bugs: CLOUDSTACK-7539
    https://issues.apache.org/jira/browse/CLOUDSTACK-7539


Repository: cloudstack-git


Description (updated)
-------

Root causes of CLOUDSTACK-7539 are below.

a. createCacheObject() method lacks mutual exclusion. After calling
   this method, threads check cache existence but the procedure is not
   atomic. So, other threads may call this method and decide to create
   a cache in spite of cache copy being in progress.

b. createCacheObject() method ignores states except READY when it
   checks cache existence. Then, it is considered cache doesn't exist
   in spite of cache copy being in progress.

c. When multiple cache entries about a same image are in a database,
   threads may get a wrong cache entry because of incorrect search
   condition that randomly chooses one entry from entries that have
   same template id. As a result of this retrieval, reference counter
   of the cache probably go negative. (I'm not sure about this
   behavior but I suspect CloudStack behaves like this.)

These behaviors make cache state incorrect and inconsistent.

For fixing this bug, I modified cache creation behavior when no cache
is on NFS Secondary Staging Store(SSS). As sequential deployment
functions so far, I think parallel deployment works well when it is
managed like sequential deployment.

Fixed points are below. These are for cause a. and b., not including
fix for cause c.

1. Protect critical section between cache search and allocation by lock.

   When management server threads reach createCacheObject() method,
   threads get lock at first and check cache existence. In case that no
   cache is on NFS SSS, one thread sends a request for creating cache.
   In other case, threads wait for completion of creating cache not to
   duplicate same image.

2. Add several condition check for cache state.

   If a thread is creating a cache of an image, other threads mustn't
   send no additional request for the same image. Therefore, threads have
   to consider whether a copy request should be issued by those state
   check.


Diffs (updated)
-----

  
engine/storage/cache/src/org/apache/cloudstack/storage/cache/manager/StorageCacheManagerImpl.java
 7c74729 

Diff: https://reviews.apache.org/r/26263/diff/


Testing (updated)
-------

I tested deployment of two instances at almost the same time. Results
are below.

- Succeed in deployment of two instances.
- No database cache entries are duplicated.
- No duplicated images are on NFS SSS.
- Reference counter never goes to -1.


Thanks,

Hiroki Ohashi

Reply via email to