> On Oct. 15, 2013, 10:41 a.m., John Burwell wrote:
> > engine/api/src/org/apache/cloudstack/engine/subsystem/api/storage/PrimaryDataStoreDriver.java,
> >  line 28
> > <https://reviews.apache.org/r/14522/diff/2/?file=364638#file364638line28>
> >
> >     I don't know that a flag is necessary to represent this capability 
> > since we already have a revertSnapshot method.  Personally, I dislike the 
> > complexity of exposing methods on an interface that are guarded by another. 
> >  I would prefer to see revertSnapshot enhanced to throw an exception when a 
> > snapshot can't be reverted.  In addition to be a simpler approach, it 
> > covers both the scenario when a device does not support reverting snapshots 
> > and, as well as, the device or the snapshot not being a revertable state.
> 
> Chris Suich wrote:
>     The purpose of canRevertSnapshot() is not to guard revertSnapshot(), but 
> to determine/query whether the driver supports reverting so that the UI can 
> appropriately show/hide the revert snapshot action. I guess in my mind, 
> canRevertSnapshot() should not check things like the state of the snapshot, 
> but things like, is the base volume still on the associated storage (since 
> the base volume could have been moved to a new primary), etc.
> 
> John Burwell wrote:
>     I think we need to find a more generic way to advertise driver 
> capabilities.  Reviewing the rest of the patch, it appears that this method 
> is being used for more than simple driver capability checking.  I need to 
> think through the best approach, but as a strawman, we could consider the 
> following:
>     
>         public interface SnapshotCapability { ... }
>         
>         public enum DefaultSnapshotCapabilities implements StorageCapability {
>             /* other capabilities */
>             REVERT;
>         }
>     
>     This approach would include extracting all of the snapshot operations to 
> a separate Snapshotable interface that would define all of the snapshot 
> operations plus Set<SnapshotCapability> getSnapshotCapabilities().  Drivers 
> that support snapshotting would implement the Snapshotable interface in 
> addition to the PrimaryDataStoreDriver interface.  A simple instanceof check 
> determines whether or not a device can be snapshotted and a query of the 
> getSnapshotCapabilities method to answer more fine grained questions.  The 
> API layer can then translate those capabilities into specific boolean values 
> or we can pass the whole set back to UI for decision making.
>     
>     Additionally, does this method refer to hypervisor or storage-level 
> snapshots?  If it refers to hypervisor snapshots, why is the determination 
> about the ability to revert being made by a storage driver?
> 
> Chris Suich wrote:
>     Are there any other capabilities that would be included in this (other 
> than take snapshot)?
>     
>     This refers only to storage-level snapshots. However, a snapshot strategy 
> overrides the hypervisor snapshot capabilities. So, hypervisor snapshots can 
> be replaced with storage-level snapshots if the storage driver supports it.
> 
> Chris Suich wrote:
>     At this level, though, we're not talking about driver capabilities. We're 
> talking about strategy capabilities. So having the storage driver implement 
> Snapshotable won't do much, because the snapshot strategies are the ones that 
> need to be asked to handle operations.

Based on your comments, I think we need to rethink the approach a bit.  The 
storage layer should *not* be concerned with the optimal snapshot strategy for 
a hypervisor/virtual machine instance/volume etc.  It should simply expose a 
snapshot capability and let the hypervisor layer determine the best approach.  
Therefore, I think the whole notion of sorting strategies in the storage layer 
needs to pushed up to the hypervisor layer where this determination should be 
made.

In terms of capability advertisement, the snapshot capability of a storage 
device will net down to the capabilities of the underlying driver.  Therefore, 
there must be some consider of the driver model when discussing higher-level 
storage operations.


> On Oct. 15, 2013, 10:41 a.m., John Burwell wrote:
> > engine/api/src/org/apache/cloudstack/engine/subsystem/api/storage/StrategyPriority.java,
> >  line 46
> > <https://reviews.apache.org/r/14522/diff/2/?file=364641#file364641line46>
> >
> >     How does the result of the canHandle method drive the comparison of two 
> > Snapshots?  Why not just compare the ops to each other other?
> 
> Chris Suich wrote:
>     The 'op' something I introduced to give strategies more flexibility in 
> their canHandle() method. The purpose of sorting the strategies by 
> canHandle() is to ensure that strategies that cannot handle the 'op' are at 
> the end of the list while plugins are at the front of the list (with 
> hypervisors and default strategies in between).
> 
> John Burwell wrote:
>     While a valid approach for UI, it does not seem necessary in the bowels 
> of the driver code.  Given the number of drivers, it would be quicker to 
> simply spin the list once than sort and iterate.  As a knock-on, it will also 
> simplify the method.
> 
> Chris Suich wrote:
>     The issue with this is that the response may not be the same for every 
> snapshot for every driver. For example, if I have the same driver applied to 
> multiple primary storages and there are snapshots for the same volume on each 
> primary storage, then the only snapshots which are revertable are those on 
> the primary storage where the volume currently resides. Some storage drivers 
> may support reverting snapshots from other primary storages, some may not.

See my comments above.  Now that I understand the purpose of this comparator, 
it feels like something that is better placed in the hypervisor layaer rather 
the storage layer.


> On Oct. 15, 2013, 10:41 a.m., John Burwell wrote:
> > engine/storage/integration-test/test/org/apache/cloudstack/storage/test/SnapshotTest.java,
> >  line 437
> > <https://reviews.apache.org/r/14522/diff/2/?file=364644#file364644line437>
> >
> >     *This comment may be more editorial at this point because of the nature 
> > of the existing code*
> >     I would prefer to see an exception thrown here rather than the 
> > complexity of the canHandle.  I am also concerned that the canHandle method 
> > only checks that the underlying device can take a snapshot without 
> > accounting for whether or not the volume is in a state to take a consistent 
> > snapshot.  I am also concerned that we don't have a semantic for handling 
> > errors that might result from the actual snapshot operation.
> 
> Chris Suich wrote:
>     That is actually the purpose of the canHandle() method - to ONLY check 
> whether the underlying device can handle the operation, not whether the 
> storage is in the right state. For example, if my driver can handle 
> snapshotting, but I am unable to actually perform the snapshot, I still want 
> to be the one asked to take the snapshot so that I am the one to cause the 
> API to fail.
>     
>     Additionally, I'm not sure how we could exclude other strategies from 
> attempting to take my storage and try to snapshot. If there are two snapshot 
> strategies and they are not sorted by canHandle(), then some other strategy 
> may be given the opportunity to take the snapshot and simply fail the API 
> because it cannot actually handle it.
> 
> John Burwell wrote:
>     This logic seems overly complicated.  There can be a number of reasons 
> why a snapshot operation will not succeed for a device.  Among those reasons 
> is that the device is being asked to perform an operation it does not 
> support.  As I understand the operation, a snapshot is a operation directed 
> to a particular volume, Volume A.  When the storage layer is asked to 
> snapshot Volume A, the work will eventually be performed by the associated 
> device driver, Device Driver 1.  If Device Driver 1 does not support the 
> requested operation, then the snapshot operation fails.  There are not other 
> options in this scenario.  Provided my understanding is correct, I don't see 
> the need for an additional canHandle method.  Simply call the method, and if 
> it doesn't throw an exception, the operation succeeded.  I am concerned that 
> the current interface seems to only concern itself with whether or not the 
> driver can do something to the exclusion of whether or not the device is in a 
> state to properly p
 erform the operation.
> 
> Chris Suich wrote:
>     Ok, I think I see what you are saying. Once the strategies are sorted (by 
> calling canHandle()), then we no longer need to check whether the strategy 
> can handle the operation and should simply execute the operation on the top 
> sorted strategy.
>     
>     Is that what you are saying?

Essentially, yes.  I also concerned with the atomicity of the operation once we 
properly consider the state of the device.  By calling canHandle, and then 
performing the operation, we creates a race condition.  Combining the check 
with the operation allows the system to atomicity check and perform the 
operation -- eliminating races with other threads performing the same operation.


> On Oct. 15, 2013, 10:41 a.m., John Burwell wrote:
> > engine/storage/snapshot/src/org/apache/cloudstack/storage/snapshot/SnapshotObject.java,
> >  line 135
> > <https://reviews.apache.org/r/14522/diff/2/?file=364645#file364645line135>
> >
> >     What is the purpose of sorting the strategies?
> 
> Chris Suich wrote:
>     The purpose of sorting the strategies is to ensure that plugin strategies 
> get priority over the hypervisor implementations and default implementations. 
> Additionally, it will also make sure that plugin strategies that cannot 
> handle the operation are at the end of the list, to ensure that hypervisor or 
> default strategies would be given the operation instead of the wrong plugin.
> 
> John Burwell wrote:
>     This priority sorting should be completely encapsulated.  Requiring 
> clients to "know" leaks an important implementation detail that will lead to 
> inconsistent behavior (i.e. bugs).  One approach would be to use a SortedSet 
> with a Comparator implementation that encapsulates the logic.
> 
> Chris Suich wrote:
>     Encapsulated where? The list of strategies needs to be sorted based on 
> the object and operation.

My concern was that properly using a strategy list required every consumer of 
the list to "know" that it needed to be sorted.  I want to see this type of 
logic encapsulated such that consumers of the list can safely assume it is 
ordered properly.

Based on my evolving understanding of the sort operation's purpose, I think we 
need to revisit this approach.


> On Oct. 15, 2013, 10:41 a.m., John Burwell wrote:
> > server/src/com/cloud/api/ApiResponseHelper.java, line 461
> > <https://reviews.apache.org/r/14522/diff/2/?file=364650#file364650line461>
> >
> >     If the snapshot was instance of SnapshotInfo and no SnapshotInfo was 
> > found, an exception should thrown.
> >     
> >     Provided my assumption is correct, I suggest refactoring to a structure 
> > such as the following:
> >     
> >     final SnapshotInfo secondaryInfo = (snapshot instanceof SnapshotInfo) ? 
> > (SnapshotInfo) snapshot : snapshotfactory.getSnapshot(snapshot.getId(), 
> > DataStoreRole.Image);
> >     
> >     if (secondaryInfo == null) {
> >        throw new CloudRuntimException(/* Appropriate error message */ );
> >     }
> >     
> >     snapshotResponse.setCanRevert(secondaryInfo.canRevert);
> 
> Chris Suich wrote:
>     Is this how we do things in other APIs? The reason I didn't thrown an 
> exception is that I'd be worried failing an entire API call just because a 
> single snapshot appears to have a broken DB-file on disk relationship. There 
> is probably a better way to go about cleaning it up, but is that worth 
> failing the API over?
> 
> John Burwell wrote:
>     It's a question of fail fast or fail early.  If the snapshot data is 
> corrupt then the operation should not continue.  Failing quickly will provide 
> the operator with a clear explanation of the failure rather than appearing to 
> be successful, but getting a corrupt result.  From a development perspective, 
> we get earlier, more accurate feedback about database corruption.
> 
> Chris Suich wrote:
>     Agreed - that is appropriate for an operator/admin. But what about the 
> end user? If I have a VM and all of a sudden my list of snapshots simply 
> won't load, what can I do about it? Instead, the list could load 
> successfully, but not return that result, mark it as not revertable, etc.

Failing slowly hurts the end user the most because their backup data is 
corrupt.  Therefore, we should surface an error message to the end user that 
causes them their operator/admin.  This type of error should not occur when the 
system is operating.  In the rare event it does happen, we need to inform the 
end user as quickly as possible to prevent further corrupt of the snapshot 
chain.


- John


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14522/#review27021
-----------------------------------------------------------


On Oct. 14, 2013, 6:50 p.m., Chris Suich wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14522/
> -----------------------------------------------------------
> 
> (Updated Oct. 14, 2013, 6:50 p.m.)
> 
> 
> Review request for cloudstack, Brian Federle, edison su, John Burwell, and 
> Mike Tutkowski.
> 
> 
> Repository: cloudstack-git
> 
> 
> Description
> -------
> 
> After the last batch of work to the revertSnapshot API, SnapshotServiceImpl 
> was not tied into the workflow to be used by storage providers. I have added 
> the logic in a similar fashion to takeSnapshot(), backupSnapshot() and 
> deleteSnapshot().
> 
> I have also added a 'Revert to Snapshot' action to the volume snapshots list 
> in the UI.
> 
> 
> Diffs
> -----
> 
>   api/src/org/apache/cloudstack/api/ApiConstants.java f85784b 
>   api/src/org/apache/cloudstack/api/response/SnapshotResponse.java e9cb109 
>   
> engine/api/src/org/apache/cloudstack/engine/subsystem/api/storage/PrimaryDataStoreDriver.java
>  b124d83 
>   
> engine/api/src/org/apache/cloudstack/engine/subsystem/api/storage/SnapshotInfo.java
>  8d6b760 
>   
> engine/api/src/org/apache/cloudstack/engine/subsystem/api/storage/SnapshotStrategy.java
>  e4cecb6 
>   
> engine/api/src/org/apache/cloudstack/engine/subsystem/api/storage/StrategyPriority.java
>  8bd04f5 
>   
> engine/api/test/org/apache/cloudstack/engine/subsystem/api/storage/StrategyPriorityTest.java
>  329b99f 
>   
> engine/storage/integration-test/test/org/apache/cloudstack/storage/test/FakePrimaryDataStoreDriver.java
>  810afd1 
>   
> engine/storage/integration-test/test/org/apache/cloudstack/storage/test/SnapshotTest.java
>  81f77d6 
>   
> engine/storage/snapshot/src/org/apache/cloudstack/storage/snapshot/SnapshotObject.java
>  3f35e1d 
>   
> engine/storage/snapshot/src/org/apache/cloudstack/storage/snapshot/XenserverSnapshotStrategy.java
>  6a874d6 
>   
> plugins/storage/volume/default/src/org/apache/cloudstack/storage/datastore/driver/CloudStackPrimaryDataStoreDriverImpl.java
>  3eaeb1f 
>   
> plugins/storage/volume/sample/src/org/apache/cloudstack/storage/datastore/driver/SamplePrimaryDataStoreDriverImpl.java
>  ece7b26 
>   
> plugins/storage/volume/solidfire/src/org/apache/cloudstack/storage/datastore/driver/SolidfirePrimaryDataStoreDriver.java
>  8046b6c 
>   server/src/com/cloud/api/ApiResponseHelper.java f4ca112 
>   server/src/com/cloud/storage/snapshot/SnapshotManagerImpl.java dade983 
>   ui/scripts/storage.js b16f4d4 
> 
> Diff: https://reviews.apache.org/r/14522/diff/
> 
> 
> Testing
> -------
> 
> I have tested all of this locally with a custom storage provider.
> 
> Unfortunately, I'm still in the middle of figuring out how to properly unit 
> test this type of code. If anyone has any recommendations, please let me know.
> 
> 
> Thanks,
> 
> Chris Suich
> 
>

Reply via email to