> On Oct. 15, 2013, 2:41 p.m., John Burwell wrote:
> > engine/api/src/org/apache/cloudstack/engine/subsystem/api/storage/PrimaryDataStoreDriver.java,
> >  line 28
> > <https://reviews.apache.org/r/14522/diff/2/?file=364638#file364638line28>
> >
> >     I don't know that a flag is necessary to represent this capability 
> > since we already have a revertSnapshot method.  Personally, I dislike the 
> > complexity of exposing methods on an interface that are guarded by another. 
> >  I would prefer to see revertSnapshot enhanced to throw an exception when a 
> > snapshot can't be reverted.  In addition to be a simpler approach, it 
> > covers both the scenario when a device does not support reverting snapshots 
> > and, as well as, the device or the snapshot not being a revertable state.
> 
> Chris Suich wrote:
>     The purpose of canRevertSnapshot() is not to guard revertSnapshot(), but 
> to determine/query whether the driver supports reverting so that the UI can 
> appropriately show/hide the revert snapshot action. I guess in my mind, 
> canRevertSnapshot() should not check things like the state of the snapshot, 
> but things like, is the base volume still on the associated storage (since 
> the base volume could have been moved to a new primary), etc.
> 
> John Burwell wrote:
>     I think we need to find a more generic way to advertise driver 
> capabilities.  Reviewing the rest of the patch, it appears that this method 
> is being used for more than simple driver capability checking.  I need to 
> think through the best approach, but as a strawman, we could consider the 
> following:
>     
>         public interface SnapshotCapability { ... }
>         
>         public enum DefaultSnapshotCapabilities implements StorageCapability {
>             /* other capabilities */
>             REVERT;
>         }
>     
>     This approach would include extracting all of the snapshot operations to 
> a separate Snapshotable interface that would define all of the snapshot 
> operations plus Set<SnapshotCapability> getSnapshotCapabilities().  Drivers 
> that support snapshotting would implement the Snapshotable interface in 
> addition to the PrimaryDataStoreDriver interface.  A simple instanceof check 
> determines whether or not a device can be snapshotted and a query of the 
> getSnapshotCapabilities method to answer more fine grained questions.  The 
> API layer can then translate those capabilities into specific boolean values 
> or we can pass the whole set back to UI for decision making.
>     
>     Additionally, does this method refer to hypervisor or storage-level 
> snapshots?  If it refers to hypervisor snapshots, why is the determination 
> about the ability to revert being made by a storage driver?
> 
> Chris Suich wrote:
>     Are there any other capabilities that would be included in this (other 
> than take snapshot)?
>     
>     This refers only to storage-level snapshots. However, a snapshot strategy 
> overrides the hypervisor snapshot capabilities. So, hypervisor snapshots can 
> be replaced with storage-level snapshots if the storage driver supports it.
> 
> Chris Suich wrote:
>     At this level, though, we're not talking about driver capabilities. We're 
> talking about strategy capabilities. So having the storage driver implement 
> Snapshotable won't do much, because the snapshot strategies are the ones that 
> need to be asked to handle operations.
> 
> John Burwell wrote:
>     Based on your comments, I think we need to rethink the approach a bit.  
> The storage layer should *not* be concerned with the optimal snapshot 
> strategy for a hypervisor/virtual machine instance/volume etc.  It should 
> simply expose a snapshot capability and let the hypervisor layer determine 
> the best approach.  Therefore, I think the whole notion of sorting strategies 
> in the storage layer needs to pushed up to the hypervisor layer where this 
> determination should be made.
>     
>     In terms of capability advertisement, the snapshot capability of a 
> storage device will net down to the capabilities of the underlying driver.  
> Therefore, there must be some consider of the driver model when discussing 
> higher-level storage operations.

There is a problem with that approach. The snapshot workflow of some storage 
vendors, myself included, does not match that of CloudStack. Because of this, 
we need our own SnapshotStrategy to handle our workflow. We should not be 
pushing snapshot strategies to the hypervisor layer because it is not just 
hypervisors that have snapshot strategies.

I will submit a patch tomorrow that removes the changes that were made to the 
storage drivers. In hindsight, they were unnecessary. However, I believe these 
changes pertaining to the snapshot strategy are necessary, either in this form 
or another. How else can we enable plugins to become the authorities on how to 
manage snapshots and ensure their implementations are invoked instead of the 
hypervisor/default?

Were you able to get understand the objective and approach of this through the 
dev list discussion? I'm frustrated that these issues weren't brought up until 
after the code has been written and submitted for review when there has been 
discussion that I thought explained all of this well before this review was 
posted.


> On Oct. 15, 2013, 2:41 p.m., John Burwell wrote:
> > engine/storage/snapshot/src/org/apache/cloudstack/storage/snapshot/SnapshotObject.java,
> >  line 135
> > <https://reviews.apache.org/r/14522/diff/2/?file=364645#file364645line135>
> >
> >     What is the purpose of sorting the strategies?
> 
> Chris Suich wrote:
>     The purpose of sorting the strategies is to ensure that plugin strategies 
> get priority over the hypervisor implementations and default implementations. 
> Additionally, it will also make sure that plugin strategies that cannot 
> handle the operation are at the end of the list, to ensure that hypervisor or 
> default strategies would be given the operation instead of the wrong plugin.
> 
> John Burwell wrote:
>     This priority sorting should be completely encapsulated.  Requiring 
> clients to "know" leaks an important implementation detail that will lead to 
> inconsistent behavior (i.e. bugs).  One approach would be to use a SortedSet 
> with a Comparator implementation that encapsulates the logic.
> 
> Chris Suich wrote:
>     Encapsulated where? The list of strategies needs to be sorted based on 
> the object and operation.
> 
> John Burwell wrote:
>     My concern was that properly using a strategy list required every 
> consumer of the list to "know" that it needed to be sorted.  I want to see 
> this type of logic encapsulated such that consumers of the list can safely 
> assume it is ordered properly.
>     
>     Based on my evolving understanding of the sort operation's purpose, I 
> think we need to revisit this approach.

The order of the list is extremely dynamic. It depends on both the object being 
operated on and the operation to perform. How would you suggest we always 
ensure the list is ordered properly when there is no authority managing that 
list?


> On Oct. 15, 2013, 2:41 p.m., John Burwell wrote:
> > engine/storage/integration-test/test/org/apache/cloudstack/storage/test/SnapshotTest.java,
> >  line 437
> > <https://reviews.apache.org/r/14522/diff/2/?file=364644#file364644line437>
> >
> >     *This comment may be more editorial at this point because of the nature 
> > of the existing code*
> >     I would prefer to see an exception thrown here rather than the 
> > complexity of the canHandle.  I am also concerned that the canHandle method 
> > only checks that the underlying device can take a snapshot without 
> > accounting for whether or not the volume is in a state to take a consistent 
> > snapshot.  I am also concerned that we don't have a semantic for handling 
> > errors that might result from the actual snapshot operation.
> 
> Chris Suich wrote:
>     That is actually the purpose of the canHandle() method - to ONLY check 
> whether the underlying device can handle the operation, not whether the 
> storage is in the right state. For example, if my driver can handle 
> snapshotting, but I am unable to actually perform the snapshot, I still want 
> to be the one asked to take the snapshot so that I am the one to cause the 
> API to fail.
>     
>     Additionally, I'm not sure how we could exclude other strategies from 
> attempting to take my storage and try to snapshot. If there are two snapshot 
> strategies and they are not sorted by canHandle(), then some other strategy 
> may be given the opportunity to take the snapshot and simply fail the API 
> because it cannot actually handle it.
> 
> John Burwell wrote:
>     This logic seems overly complicated.  There can be a number of reasons 
> why a snapshot operation will not succeed for a device.  Among those reasons 
> is that the device is being asked to perform an operation it does not 
> support.  As I understand the operation, a snapshot is a operation directed 
> to a particular volume, Volume A.  When the storage layer is asked to 
> snapshot Volume A, the work will eventually be performed by the associated 
> device driver, Device Driver 1.  If Device Driver 1 does not support the 
> requested operation, then the snapshot operation fails.  There are not other 
> options in this scenario.  Provided my understanding is correct, I don't see 
> the need for an additional canHandle method.  Simply call the method, and if 
> it doesn't throw an exception, the operation succeeded.  I am concerned that 
> the current interface seems to only concern itself with whether or not the 
> driver can do something to the exclusion of whether or not the device is in a 
> state to properly p
 erform the operation.
> 
> Chris Suich wrote:
>     Ok, I think I see what you are saying. Once the strategies are sorted (by 
> calling canHandle()), then we no longer need to check whether the strategy 
> can handle the operation and should simply execute the operation on the top 
> sorted strategy.
>     
>     Is that what you are saying?
> 
> John Burwell wrote:
>     Essentially, yes.  I also concerned with the atomicity of the operation 
> once we properly consider the state of the device.  By calling canHandle, and 
> then performing the operation, we creates a race condition.  Combining the 
> check with the operation allows the system to atomicity check and perform the 
> operation -- eliminating races with other threads performing the same 
> operation.

How can we determine which strategy to use if we do not check whether the 
strategy can handle an operation? The canHandle method should not be doing some 
dynamic check of the state of a the storage to determine whether it can handle 
an operation. It should simply indicate whether a strategy supports the static 
operation. If there is a better way to approach this, then that is fine, but 
the strategies should be able to indicate that. Again, this has nothing to do 
with the underlying storage. Storage drivers that want to be baked into the 
default snapshotting workflow can implement the driver level takeSnapshot() and 
revertSnapshot() method.


> On Oct. 15, 2013, 2:41 p.m., John Burwell wrote:
> > engine/api/src/org/apache/cloudstack/engine/subsystem/api/storage/StrategyPriority.java,
> >  line 46
> > <https://reviews.apache.org/r/14522/diff/2/?file=364641#file364641line46>
> >
> >     How does the result of the canHandle method drive the comparison of two 
> > Snapshots?  Why not just compare the ops to each other other?
> 
> Chris Suich wrote:
>     The 'op' something I introduced to give strategies more flexibility in 
> their canHandle() method. The purpose of sorting the strategies by 
> canHandle() is to ensure that strategies that cannot handle the 'op' are at 
> the end of the list while plugins are at the front of the list (with 
> hypervisors and default strategies in between).
> 
> John Burwell wrote:
>     While a valid approach for UI, it does not seem necessary in the bowels 
> of the driver code.  Given the number of drivers, it would be quicker to 
> simply spin the list once than sort and iterate.  As a knock-on, it will also 
> simplify the method.
> 
> Chris Suich wrote:
>     The issue with this is that the response may not be the same for every 
> snapshot for every driver. For example, if I have the same driver applied to 
> multiple primary storages and there are snapshots for the same volume on each 
> primary storage, then the only snapshots which are revertable are those on 
> the primary storage where the volume currently resides. Some storage drivers 
> may support reverting snapshots from other primary storages, some may not.
> 
> John Burwell wrote:
>     See my comments above.  Now that I understand the purpose of this 
> comparator, it feels like something that is better placed in the hypervisor 
> layaer rather the storage layer.
>     
>     
>

I'm not sure where this hypervisor layer you're talking about is. When requests 
come in for an operation like snapshotting, it goes straight to the manager, 
which dispatches the request to a strategy. Snapshot strategies provide 
alternatives to the hypervisor/default workflow.


> On Oct. 15, 2013, 2:41 p.m., John Burwell wrote:
> > server/src/com/cloud/api/ApiResponseHelper.java, line 461
> > <https://reviews.apache.org/r/14522/diff/2/?file=364650#file364650line461>
> >
> >     If the snapshot was instance of SnapshotInfo and no SnapshotInfo was 
> > found, an exception should thrown.
> >     
> >     Provided my assumption is correct, I suggest refactoring to a structure 
> > such as the following:
> >     
> >     final SnapshotInfo secondaryInfo = (snapshot instanceof SnapshotInfo) ? 
> > (SnapshotInfo) snapshot : snapshotfactory.getSnapshot(snapshot.getId(), 
> > DataStoreRole.Image);
> >     
> >     if (secondaryInfo == null) {
> >        throw new CloudRuntimException(/* Appropriate error message */ );
> >     }
> >     
> >     snapshotResponse.setCanRevert(secondaryInfo.canRevert);
> 
> Chris Suich wrote:
>     Is this how we do things in other APIs? The reason I didn't thrown an 
> exception is that I'd be worried failing an entire API call just because a 
> single snapshot appears to have a broken DB-file on disk relationship. There 
> is probably a better way to go about cleaning it up, but is that worth 
> failing the API over?
> 
> John Burwell wrote:
>     It's a question of fail fast or fail early.  If the snapshot data is 
> corrupt then the operation should not continue.  Failing quickly will provide 
> the operator with a clear explanation of the failure rather than appearing to 
> be successful, but getting a corrupt result.  From a development perspective, 
> we get earlier, more accurate feedback about database corruption.
> 
> Chris Suich wrote:
>     Agreed - that is appropriate for an operator/admin. But what about the 
> end user? If I have a VM and all of a sudden my list of snapshots simply 
> won't load, what can I do about it? Instead, the list could load 
> successfully, but not return that result, mark it as not revertable, etc.
> 
> John Burwell wrote:
>     Failing slowly hurts the end user the most because their backup data is 
> corrupt.  Therefore, we should surface an error message to the end user that 
> causes them their operator/admin.  This type of error should not occur when 
> the system is operating.  In the rare event it does happen, we need to inform 
> the end user as quickly as possible to prevent further corrupt of the 
> snapshot chain.

If this is the approach the community agrees with, I have no problem with that. 
I just hadn't seen a scenario quite like this, so I went with what appeared to 
be the least destructive case - I have no strong feelings either way.


- Chris


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14522/#review27021
-----------------------------------------------------------


On Oct. 14, 2013, 10:50 p.m., Chris Suich wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14522/
> -----------------------------------------------------------
> 
> (Updated Oct. 14, 2013, 10:50 p.m.)
> 
> 
> Review request for cloudstack, Brian Federle, edison su, John Burwell, and 
> Mike Tutkowski.
> 
> 
> Repository: cloudstack-git
> 
> 
> Description
> -------
> 
> After the last batch of work to the revertSnapshot API, SnapshotServiceImpl 
> was not tied into the workflow to be used by storage providers. I have added 
> the logic in a similar fashion to takeSnapshot(), backupSnapshot() and 
> deleteSnapshot().
> 
> I have also added a 'Revert to Snapshot' action to the volume snapshots list 
> in the UI.
> 
> 
> Diffs
> -----
> 
>   api/src/org/apache/cloudstack/api/ApiConstants.java f85784b 
>   api/src/org/apache/cloudstack/api/response/SnapshotResponse.java e9cb109 
>   
> engine/api/src/org/apache/cloudstack/engine/subsystem/api/storage/PrimaryDataStoreDriver.java
>  b124d83 
>   
> engine/api/src/org/apache/cloudstack/engine/subsystem/api/storage/SnapshotInfo.java
>  8d6b760 
>   
> engine/api/src/org/apache/cloudstack/engine/subsystem/api/storage/SnapshotStrategy.java
>  e4cecb6 
>   
> engine/api/src/org/apache/cloudstack/engine/subsystem/api/storage/StrategyPriority.java
>  8bd04f5 
>   
> engine/api/test/org/apache/cloudstack/engine/subsystem/api/storage/StrategyPriorityTest.java
>  329b99f 
>   
> engine/storage/integration-test/test/org/apache/cloudstack/storage/test/FakePrimaryDataStoreDriver.java
>  810afd1 
>   
> engine/storage/integration-test/test/org/apache/cloudstack/storage/test/SnapshotTest.java
>  81f77d6 
>   
> engine/storage/snapshot/src/org/apache/cloudstack/storage/snapshot/SnapshotObject.java
>  3f35e1d 
>   
> engine/storage/snapshot/src/org/apache/cloudstack/storage/snapshot/XenserverSnapshotStrategy.java
>  6a874d6 
>   
> plugins/storage/volume/default/src/org/apache/cloudstack/storage/datastore/driver/CloudStackPrimaryDataStoreDriverImpl.java
>  3eaeb1f 
>   
> plugins/storage/volume/sample/src/org/apache/cloudstack/storage/datastore/driver/SamplePrimaryDataStoreDriverImpl.java
>  ece7b26 
>   
> plugins/storage/volume/solidfire/src/org/apache/cloudstack/storage/datastore/driver/SolidfirePrimaryDataStoreDriver.java
>  8046b6c 
>   server/src/com/cloud/api/ApiResponseHelper.java f4ca112 
>   server/src/com/cloud/storage/snapshot/SnapshotManagerImpl.java dade983 
>   ui/scripts/storage.js b16f4d4 
> 
> Diff: https://reviews.apache.org/r/14522/diff/
> 
> 
> Testing
> -------
> 
> I have tested all of this locally with a custom storage provider.
> 
> Unfortunately, I'm still in the middle of figuring out how to properly unit 
> test this type of code. If anyone has any recommendations, please let me know.
> 
> 
> Thanks,
> 
> Chris Suich
> 
>

Reply via email to