RE: new storage framework update

Edison Su Fri, 18 Jan 2013 11:09:30 -0800


> -----Original Message-----
> From: Wido den Hollander [mailto:w...@widodh.nl]
> Sent: Friday, January 18, 2013 12:51 AM
> To: cloudstack-dev@incubator.apache.org
> Subject: Re: new storage framework update
>
> Hi,
>
> On 01/16/2013 02:35 AM, Edison Su wrote:
> > After a lengthy discussion(more than two hours) with John on Skype, I
> think we figured out the difference between us.  The API proposed by John
> is more at the execution level, that's where input/output stream coming
> from, which assumes that both source and destination object will be
> operated at the same place(either inside ssvm, or on hypervisor host). While
> the API I proposed is more about how to hook up vendor's own storage into
> cloudstack's mgt server, thus can replace the process on how and where to
> operate on the storage.
> > Let's talk about the execution model at first, which will have huge impact
> on the design we made. The execution model is about where to execute
> operations issued by mgt server. Currently, there is no universal execution
> model, it's quite different for each hypervisor.
> >   E.g. for KVM, mgt server will send commands to KVM host, there is a java
> agent running on kvm host, which can execute command send by mgt server.
> > For xenserver, most of commands will be executed on mgt server, which
> will call xapi, then talking to xenserver host.  But we do put some python
> code at xenserver host, if there are operations not supported by xapi.
> > For vmware, most of commands will be executed on mgt server, which
> talking to vcenter API, while some of them will be executed inside SSVM.
> > Due to the different execution models, we'll get into a problem about how
> and where to access storage device. For example, there is a storage box,
> which has its own management API to be accessed. Now I want to create a
> volume on the storage box, where should I call stoage box's create volume
> api? If we follow up above execution models, we need to call the api at
> different places and even worse, you need to write the API call in different
> languages. For kvm, you may need to write java code in kvm agent, for
> xenserver, you may need to write a xapi python plugin, for vmware, you may
> need to put the java code inside ssvm  etc.
> > But if the storage box already has management api, why just call it inside
> cloudstack mgt server, then device vendor should just write java code once,
> for all the different hypervisors? If we don't enforce the execution model,
> then the storage framework should have a hook in management server,
> device vendor can decide where to execute commands send by mgt server.
>
> With this you are assuming that the management server always has access to
> the API of the storage box?
>
> What if the management server is in network X (say Amsterdam) en I have a
> zone in London where my storage box X is in a private network.
>
> The only one that can access the API then is the hypervisor, so the calls have
> to go through there.
>
> I don't want to encourage people to write "stupid code" where they assume
> that the management server is this thing which is tied up into every network.


I think we will change the current mgt server deployment model to cluster of 
mgt servers per zone, instead of a cluster of mgt servers manage the whole 
zones: https://cwiki.apache.org/confluence/display/CLOUDSTACK/AWS-Style+Regions
If above works, then mgt server can assume it can access storage box's API. 
BTW, the mgt server does need to access some private mgt API, such as 
F5/netscaler etc.
>
> Wido
>
> > That's my datastoredriver layer used for. Take taking snapshot diagram
> > as an example:
> >
> https://cwiki.apache.org/confluence/download/attachments/30741569/take
> > +snapshot+sequence.png?version=1&modificationDate=1358189965000
> > Datastoredriver is running inside mgt server, while datastoredriver itself
> can decide where to execute "takasnapshot" API, driver can send a
> command to hypervisor host, or directly call storage box's API, or directly 
> call
> hypervisor's own API, or another service running outside of cloudstack mgt
> server. It's all up to the implementation of driver.
> > Does it make sense? If it's true, the device driver should not take 
> > input/out
> stream as parameter, as it enforces the execution model, which I don't think
> it's necessary.
> > BTW, John and I will discuss the matter tomorrow on Skype, if you want to
> join, please let me know.
> >
> >> -----Original Message-----
> >> From: Edison Su [mailto:edison...@citrix.com]
> >> Sent: Monday, January 14, 2013 3:19 PM
> >> To: cloudstack-dev@incubator.apache.org
> >> Subject: RE: new storage framework update
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: John Burwell [mailto:jburw...@basho.com]
> >>> Sent: Friday, January 11, 2013 12:30 PM
> >>> To: cloudstack-dev@incubator.apache.org
> >>> Subject: Re: new storage framework update
> >>>
> >>> Edison,
> >>>
> >>> I think we are speaking past each other a bit.  My intention is to
> >>> separate logical and physical storage operations in order to
> >>> simplify the implementation of new storage providers.  Also, in
> >>> order to support the widest range of storage mechanisms, I want to
> >>> eliminate all interface assumptions (implied and explicit) that a
> >>> storage device supports a file
> >>
> >> I think if the nfs secondary storage is optional, then all the
> >> inefficient related to object storage will get away?
> >>
> >>> system.  These two issues make implementation of efficient  storage
> >>> drivers extremely difficult.  For example, for object stores, we
> >>> have to create polling synchronization threads that add complexity,
> >>> overhead, and latency to the system.  If we could connect the
> >>> OutputStream of a source (such as an HTTP
> >>> upload) to the InputStream of the object store, transfer operations
> >>> would be far simpler and efficient.  The conflation of logical and
> >>> physical operations also increases difficulty and complexity to
> >>> reliably and maintainably implement cross-cutting storage features
> >>> such as at-rest encryption.  In my opinion, the current design in
> >>> Javelin makes progress on the first point, but does not address the
> >>> second point.  Therefore, I propose that we refine the design to
> >>> explicitly separate logical and physical operations and utilize the
> >>> higher level I/O abstractions provided by the JDK to remove any
> >>> interface
> >> requirements for a file-based operations.
> >>>
> >>> Based on these goals, I propose keeping the logical Image,
> >>> ImageMotion, Volume, Template, and Snapshot services.  These
> >>> services would be responsible for logical storage operations (.e.g
> >>> createVolumeFromTemplate, downloadTemplate, createSnapshot,
> >>> deleteSnapshot, etc).  To perform physical operations,  the
> >>> StorageDevice concept would be added with the following operations:
> >>>
> >>> * void read(URI aURI, OutputStream anOutputStream) throws
> >>> IOException
> >>> * void write(URI aURI, InputStream anInputStream)  throws
> >>> IOException
> >>> * Set<URI> list(URI aURI)  throws IOException
> >>> * boolean delete(URI aURI) throws IOException
> >>> * StorageDeviceType getType()
> >>
> >> I agree with your simplified interface, but still cautious about the
> >> simple URI may not enough.
> >> For example, at the driver level, what about driver developer wants
> >> to know extra information about the object being operated?
> >> I ended up with new APIs like:
> >>
> https://cwiki.apache.org/confluence/download/attachments/30741569/pro
> >> v
> >> ider.jpg?version=1&modificationDate=1358168083079
> >>   At the driver level, it works on two interfaces:
> >>   DataObject, which is the interface of volume/snapshot/template.
> >> DataStore, which is the interface of all the primary storage or image
> storage.
> >> The API is pretty much looks like you proposed:
> >> grantAccess(DataObject, EndPoint ep): make the object accessible for
> >> an endpoint, and return an URI represent the object. This is used
> >> during moving the object around different storages.  For example, in
> >> the sequence diagram, create volume from template:
> >>
> https://cwiki.apache.org/confluence/download/attachments/30741569/cre
> >> a
> tevolumeFromtemplate.png?version=1&modificationDate=1358172931767,
> >> datamotionstrategy will call grantaccess on both source and
> >> destination datastore, then got two URIs represent the source and
> >> destination object, then send the URIs to endpoint(it can be the
> >> agent running side ssvm, or it can be a hypervisor host) to conduct the
> actual copy operation.
> >> Revokeaccess: the opposite of above API.
> >> listObjects(DataStore), list objects on datastore
> >> createAsync(DataObject): create an object on datastore, the driver
> >> shouldn't care about what's the object it is, but should only care
> >> about the size of the object, the data store of the object, all of
> >> these information can be directly inferred from DataObject. If the
> >> driver needs more information about the object, driver developer can
> >> get the id of the object, query database, then find about more
> >> information. And this interface has no assumption about the
> >> underneath storage, it can be primary storage, or s3/swift, or a ftp 
> >> server,
> or whatever writable storage.
> >> deleteAsync(DataObject): delete an object on a datastore, the
> >> opposite of createAsync copyAsync(DataObject, DataObject): copy src
> >> object to dest object. It's for storage migration. Some storage
> >> vendor or hypervisor has its own efficient way to migrate storage
> >> from one place to another. Most of the time, the migration across
> >> different vendors or different storage types(primary <=> image
> >> storage), needs to go to datamotionservice, which will be covered later.
> >> canCopy(DataObject, DataObject): it helps datamotionservice to make
> >> the decision on storage migration.
> >>
> >> For primary storage driver, there are extra two APIs:
> >> takeSnapshot(SnapshotInfo snapshot): take snapshot
> >> revertSnapshot(SnapshotInfo snapshot): revert snapshot.
> >>
> >>
> >>>
> >>> This interface does not mirror any that I am aware of the current JDK.
> >>> Instead, it leverages the facilities it provides to abstract I/O
> >>> operations between different types of devices (e.g. reading data
> >>> from a socket and writing to a file or reading data from a socket
> >>> and writing it to
> >> another socket).
> >>> Specifying the input or output stream allows the URI to remain
> >>> logical and device agnostic because the device is being a physical
> >>> stream from which to read or write with it.  Therefore, specifying a
> >>> logical URI without the associated stream would require implicit
> >>> assumptions to be made by the StorageDevice and clients regarding
> >>> data acquisition.  To perform physical operations, one or more
> >>> instances of StorageDevice would be passed into to the logical
> >>> service methods to compose into a set of physical operations to
> >>> perform logical operation (e.g. copying a template from secondary
> storage to a volume).
> >>
> >>
> >> I think our difference is only about the parameter of the API is an
> >> URI or an Object.
> >> Using an Object instead of a plain URI, using an object maybe more
> >> flexible, and the DataObject itself has an API called: getURI, which
> >> can translate the Object into an URI. See the interface of DataObject:
> >>
> https://cwiki.apache.org/confluence/download/attachments/30741569/dat
> >> a
> >> +model.jpg?version=1&modificationDate=1358171015660
> >>
> >>
> >>>
> >>> StorageDevices are not intended to be content aware.  They simply
> >>> map logical URIs to the physical context they represent (a path on a
> >>> filesystem, a bucket and key in an object store, a range of blocks
> >>> in a block store, etc) and perform the requested operation on the
> >>> physical context (i.e. read a byte stream from the physical location
> >>> representing "/template/2/200", delete data represented by
> >>> "/snapshot/3/300", list the contents of the physical location
> >>> represented by "/volume/4/400", etc).  In my opinion, it would be a
> >>> misuse of a URI to infer an operation from their content.  Instead,
> >>> the VolumeService would expose a method such as the following to
> >> perform the creation of a volume from a template:
> >>>
> >>> createVolumeFromTemplate(Template aTemplate, StorageDevice
> >>> aTemplateDevice, Volume aVolume, StorageDevice aVolumeDevice,
> >>> Hypervisor aHypervisor)
> >>>
> >>> The VolumeService would coordinate the creation of the volume with
> >>> the passed hypervisor and, using the InputStream and OutputStreams
> >>> provided by the devices, coordinate the transfer of data between the
> >>> template storage device and the volume storage device. Ideally, the
> >>> Template and Volume classes would encapsulate the rules for logical
> >>> URI creation in a method.  Similarly, the SnapshotService would
> >>> expose the a method such as the following to take a snapshot of a
> volume:
> >>>
> >>> createSnapshot(Volume aVolume, StorageDevice aSnapshotDevice)
> >>>
> >>> The SnapshotService would request the creation of a snapshot for the
> >>> volume and then request a write of the snapshot data to the
> >>> StorageDevice through the write method.
> >>
> >> I agree, the service has rich apis, while at the driver level, the
> >> api should be as simple and neutral to the object operated on.
> >> I updated the sequence diagrams:
> >> create volume from template:
> >>
> https://cwiki.apache.org/confluence/download/attachments/30741569/cre
> >> a
> >>
> tevolumeFromtemplate.png?version=1&modificationDate=1358172931767
> >> add template into image storage:
> >>
> https://cwiki.apache.org/confluence/download/attachments/30741569/reg
> >> i
> >>
> ster+template+on+image+store.png?version=1&modificationDate=13581895
> >> 65551
> >> take snapshot:
> >>
> https://cwiki.apache.org/confluence/download/attachments/30741569/tak
> >> e
> >> +snapshot+sequence.png?version=1&modificationDate=1358189965438
> >> backup snapshot into image storage:
> >>
> https://cwiki.apache.org/confluence/download/attachments/30741569/bac
> >> k
> >>
> up+snapshot+sequence.png?version=1&modificationDate=1358192407152
> >>
> >> Could you help to review?
> >>
> >>>
> >>> I hope these explanations clarify both the design and motivation of
> >>> my proposal.  I believe it is critical for the project's future
> >>> development that the storage layer efficiently operate with storage
> >>> devices that do not support traditional filesystems (e.g. object
> >>> stores, raw block devices, etc).  There are a fair number of these
> >>> types of devices which CloudStack will likely need to support in the
> >>> future.  I believe that CloudStack will be well positioned to
> >>> maintainability and efficiently support them if it carefully
> >>> separates logical
> >> and physical storage operations.
> >>
> >> Thanks for you feedback, I rewrite the API last weekend based on your
> >> suggestion, and update the wiki:
> >>
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+subsys
> >> t
> >> em+2.0
> >> The code is starting, but not checked into javelin branch yet.
> >>
> >>>
> >>> Thanks,
> >>> -John
> >>>
> >>> On Jan 9, 2013, at 8:10 PM, Edison Su <edison...@citrix.com> wrote:
> >>>
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: John Burwell [mailto:jburw...@basho.com]
> >>>>> Sent: Tuesday, January 08, 2013 8:51 PM
> >>>>> To: cloudstack-dev@incubator.apache.org
> >>>>> Subject: Re: new storage framework update
> >>>>>
> >>>>> Edison,
> >>>>>
> >>>>> Please see my thoughts in-line below.  I apologize for S3-centric
> >>>>> nature of my example in advance -- it happens to be top of mind
> >>>>> for
> >>> obvious reasons ...
> >>>>>
> >>>>> Thanks,
> >>>>> -John
> >>>>>
> >>>>> On Jan 8, 2013, at 5:59 PM, Edison Su <edison...@citrix.com> wrote:
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: John Burwell [mailto:jburw...@basho.com]
> >>>>>>> Sent: Tuesday, January 08, 2013 10:59 AM
> >>>>>>> To: cloudstack-dev@incubator.apache.org
> >>>>>>> Subject: Re: new storage framework update
> >>>>>>>
> >>>>>>> Edison,
> >>>>>>>
> >>>>>>> In reviewing the javelin, I feel that there is a missing abstraction.
> >>>>>>> At the lowest level, storage operations are the storage,
> >>>>>>> retrieval, deletion, and listing of byte arrays stored at a particular
> URI.
> >>>>>>> In order to implement this concept in the current Javelin
> >>>>>>> branch,
> >>>>>>> 3-5 strategy classes must implemented to perform the following
> >>>>>>> low-level
> >>>>> operations:
> >>>>>>>
> >>>>>>>   * open(URI aDestinationURI): OutputStream throws IOException
> >>>>>>>   * write(URI aDestinationURI, OutputStream anOutputStream)
> >> throws
> >>>>>>> IOException
> >>>>>>>   * list(URI aDestinationURI) : Set<URI> throws IOException
> >>>>>>>   * delete(URI aDestinationURI) : boolean throws IOException
> >>>>>>>
> >>>>>>> The logic for each of these strategies will be identical which
> >>>>>>> will lead to to the creation of a support class + glue code (i.e.
> >>>>>>> either individual adapter classes
> >>>>>
> >>>>> I realize that I omitted a couple of definitions in my original
> >>>>> email.  First, the StorageDevice most likely would be implemented
> >>>>> on a domain object that also contained configuration information
> >>>>> for a resource.  For example, the S3Impl class would also
> >>>>> implement StorageDevice.  On reflection (and a little pseudo
> >>>>> coding), I would also like to refine my original proposed StorageDevice
> interface:
> >>>>>
> >>>>>    * void read(URI aURI, OutputStream anOutputStream) throws
> >>> IOException
> >>>>>    * void write(URI aURI, InputStream anInputStream)  throws
> >> IOException
> >>>>>    * Set<URI> list(URI aURI)  throws IOException
> >>>>>    * boolean delete(URI aURI) throws IOException
> >>>>>    * StorageDeviceType getType()
> >>>>>
> >>>>>>
> >>>>>> If the lowest api is too opaque, like one URI as parameter,  I am
> >>>>>> wondering
> >>>>> it may make the implementation more complicated than it sounds.
> >>>>>> For example, there are at least 3 APIs for primary storage driver:
> >>>>> createVolumeFromTemplate, createDataDisk, deleteVolume, and
> two
> >>>>> snapshot related APIs: createSnapshot, deleteSnapshot.
> >>>>>> How to encode above operations into simple write/delete APIs? If
> >>>>>> one URI
> >>>>> contains too much information, then at the end of day, the
> >>>>> receiver side(the code in hypervisor resource), who is responsible
> >>>>> to decode the URI, is becoming complicated.  That's the main
> >>>>> reason, I decide to use more specific APIs instead of one opaque URI.
> >>>>>> That's true, if the API is too specific, people needs to
> >>>>>> implement ton of
> >>>>> APIs(mainly imagedatastoredirver, primarydatastoredriver,
> >>>>> backupdatastoredriver), and all over the place.
> >>>>>> Which one is better? People can jump into discuss.
> >>>>>>
> >>>>>
> >>>>> The URI scheme should be a logical, unique, and reversal values
> >>>>> associated with the type of resource being stored.  For example,
> >>>>> the general form of template URIs would
> >>>>> "/template/<account_id>/<template_id>/template.properties" and
> >>>>> "/template/<account_id>/<template_id>/<uuid>.vhd" .  Therefore,
> >>>>> for account id 2, template id 200, the template.properties
> >>>>> resource would be assigned a URI of
> "/template/2/200/template.properties.
> >>>>> The StorageDevice implementation translates the logical URI to a
> >>>>> physical representation.  Using
> >>>>> S3 as an example, the StorageDevice is configured to use bucket
> >>>>> jsb- cloudstack at endpoint s3.amazonaws.com.  The S3 storage
> >>>>> device would translate the URI to s3://jsb-
> >>>>> cloudstack/templates/2/200/template.properties.  For an NFS
> >>>>> storage device mounted on nfs://localhost/cloudstack, the
> >>>>> StorageDevice would translate the logical URI to
> >>>>>
> >>>
> hfs://localhost/cloudstack/template/<account_id>/<template_id>/templ
> >>> a
> >>>>> te .properties.  In short, I believe that we can devise a simple
> >>>>> scheme that allows the StorageDevice to treat the URI path
> >>>>> relative to its root.
> >>>>>
> >>>>> To my mind, the createVolumeFromTemplate is decomposable into a
> >>>>> series of StorageDevice#read and StorageDevice#write operations
> >>>>> which would be issued by the VolumeManager service such as the
> >> following:
> >>>>>
> >>>>> public void createVolumeFromTemplate(Template aTemplate,
> >>>>> StorageDevice aTemplateDevice, Volume aVolume, StorageDevice
> >>>>> aVolumeDevice) {
> >>>>>
> >>>>> try {
> >>>>>
> >>>>> if (aVolumeDevice.getType() != StorageDeviceType.BLOCK ||
> >>>>> aVolumeDevice.getType() != StorageDeviceType.FILE_SYSTEM)
> { throw
> >>> new
> >>>>> UnsupportedStorageDeviceException(...);
> >>>>> }
> >>>>>
> >>>>> // Pull the template from template device into a temporary
> >>>>> directory final File aTemplateDirectory = new File(<template temp
> >>>>> path>)
> >>>>>
> >>>>> // Non-DRY -- likely a candidate for a
> >>>>> TemplateService#downloadTemplate method
> >>> aTemplateDevice.read(new
> >>>>>
> URI("/templates/<account_id>/<template_id>/template.properties"),
> >>> new
> >>>>> FileOutStream(aTemplateDirectory.createFille("template.properties"
> >>>>> )
> >>>>> );
> >>>>> aTemplate.read(new
> >>>>>
> URI("/templates/<account_id>/<template_id>/<template_uuid>.vhd"),
> >>>>> new
> >>>>>
> >> FileOutputStream(aTemplateDirectory.createFile("<template_uuid>.vhd
> >>>>> ")
> >>>>> ;
> >>>>>
> >>>>> // Perform operations with hypervisor as necessary to register
> >>>>> storage which yields // anInputStream (possibly a
> >>>>> List<InputStream>)
> >>>>>
> >>>>> aVolumeDevice.write(new
> URI("/volume/<account_id>/<volume_id>",
> >>>>> anInputStream);
> >>>>
> >>>>
> >>>> Not sure we really need the API looks like java IO, but I can see
> >>>> the value of using URI to encode objects(volume/snapshot/template
> etc):
> >>> driver layer API will be very simple, and can be shared by multiple
> >>> components(volume/image services etc) Currently, there is one
> >>> datastore object for each storage, the datastore object mainly used
> >>> by cloudstack mgt server, to read/write database, and to maintain
> >>> the state of each
> >>> object(volume/snapshot/template) in the datastore. And the datastore
> >>> object also provides interface for lifecycle management, and a
> >>> transformer(which can transform a db object into a *TO, or an URI).
> >>> The purpose of datastore object is that, I want to offload a lot of
> >>> logic from volume/template manager into each object, as the manager
> >>> is a singleton, which is not easy to be extended.
> >>>> The relationship between these classes are:
> >>>> For volume service: Volumeserviceimpl -> primarydatastore ->
> >>>> primarydatastoredriver For image service: imageServiceImpl ->
> >>>> imagedataStore -> imagedataStoredriver For snapshot service:
> >>> snapshotServiceImpl -> {primarydataStore/imagedataStore} - >
> >>> {primarydatastoredriver/imagedatastoredriver}, the snapshot can be
> >>> on both primarydatastore and imagedatastore.
> >>>>
> >>>> The current driver API is not good enough, it's too specific for each
> object.
> >>> For example, there will be an API called createsnapshot in
> >>> primarydatastoredriver, and an API called moveSnapshot in
> >>> imagedataStoredriver(in order to implement moving snapshot from
> >>> primary storage to image store ), also may have an API called,
> >>> createVolume in primarydatastoredriver, and an API called moveVolume
> >>> in imagedatastoredriver(in order to implement moving volume from
> >>> primary to image store). The more objects we add, the driver API
> >>> will be
> >> bloated.
> >>>>
> >>>> If driver API is using the model you suggested, the simple
> >>> read/write/delete with URI, for example:
> >>>> void Create(URI uri) throws IOException void copy(URI desturi, URI
> >>>> srcUri) throws IOException boolean delete(URI uri) throws
> >>>> IOException set<URI> list(URI uri) throws IOException
> >>>>
> >>>> create API has multiple means under different context: if the URI
> >>>> has
> >>> "*/volume/*" means creating volume, if URI has "*/template" means
> >>> creating template, and so on.
> >>>> The same for copy API:
> >>>> if both destUri and srcUri is volume, it can have different
> >>>> meanings, if both
> >>> volumes are in the same storage, means create a volume a from a base
> >>> volume. If both are in the different storages, means volume migration.
> >>>> If destUri is a volume, while the srcUri is a template, means,
> >>>> create a
> >>> volume from template.
> >>>> If destUri is a volume, srcUri is a snapshot and on the same
> >>>> storage, means revert snapshot If destUri is a volume, srcUri is a
> >>>> snapshot, but on
> >>> the different storages, means create volume from snapshot.
> >>>> If destUri is a snapshot, srcUri is a volume, means create snapshot
> >>>> from
> >>> volume.
> >>>> If destUri is a snapshot, srcUri is a snapshot, but on the
> >>>> different places,
> >>> means snapshot backup.
> >>>> If destUri is a template, srcUri is a snapshot, means create
> >>>> template from
> >>> snapshot.
> >>>> As you can see, the API is too opaque, needs a complicated logic to
> >>>> encode
> >>> and decode the URIs.
> >>>> Are you OK with above API?
> >>>>
> >>>>>
> >>>>> } catch (IOException e) {
> >>>>>
> >>>>>       // Log and handle the error ...
> >>>>>
> >>>>> } finally {
> >>>>>
> >>>>>       // Close resources ...
> >>>>>
> >>>>> }
> >>>>>
> >>>>> }
> >>>>>
> >>>>> Dependent on the capabilities of the hypervisor's Java API, the
> >>>>> temporary files may not be required, and an OutputStream could
> >>>>> copied directly to an InputStream.
> >>>>>
> >>>>>>
> >>>>>>> or a class that implements a ton of interfaces).  In addition to
> >>>>>>> this added complexity, this segmented approach prevents the
> >>>>> implementation
> >>>>>>> of common, logical storage features such as ACL enforcement and
> >>>>>>> asset
> >>>>>>
> >>>>>> This is a good question, how to share the code across multiple
> >>> components.
> >>>>> For example, one storage can be used as both primary storage and
> >>>>> backup storage. In the current code, developer needs to implement
> >>>>> both primarydataStoredriver and backupdatastoredriver, in order to
> >>>>> share code between these two drivers if needed, I think developer
> >>>>> can write one driver which implements both interfaces.
> >>>>>
> >>>>> In my opinion, storage drivers classifying their usage limits
> >>>>> functionality and composability.  Hence, my thought is that the
> >>>>> StorageDevice should describe its capabilities -- allowing the
> >>>>> various services (e.g. Image, Template, Volume,
> >>>>> etc) to determine whether or not the passed storage devices can
> >>>>> support the requested operation.
> >>>>>
> >>>>>>
> >>>>>>> encryption.  With a common representation of a StorageDevice
> >>>>>>> that operates on the standard Java I/O model, we can layer in
> >>>>>>> cross-cutting storage operations in a consistent manner.
> >>>>>>
> >>>>>> I agree that nice to have a standard device model, like the POSIX
> >>>>>> file
> >>>>> system API in Unix world. But I haven't figure out how to
> >>>>> generalized all the operations on the storage, as I mentioned above.
> >>>>>> I can think about, createvolumefromtemplate, can be generalized
> >>>>>> as link
> >>>>> api, but how about taking snapshot? How about who will handle the
> >>>>> difference between delete voume and  delete snapshot, if they are
> >>>>> using the same delete API?
> >>>>>
> >>>>> The following is an snippet that would be part of the
> >>>>> SnapshotService to take a snapshot:
> >>>>>
> >>>>>       // Ask the hypervisor to take a snapshot yields anInputStream 
> >>>>> (e.g.
> >>>>> FileInputStream)
> >>>>>
> >>>>>       aSnapshotDevice.write(new
> >>>>> URI("/snapshots/<account_id>/<snapshot_id>), anInputStream)
> >>>>>
> >>>>> Ultimately, a snapshot can be exported to a single file or
> >>>>> OutputStream which can written back out to a StorageDevice.  For
> >>>>> deleting a snapshot, the following snippet would perform the
> >>>>> deletion in
> >>> the SnapshotService:
> >>>>>
> >>>>>       // Ask the hypervisor to delete the snapshot ...
> >>>>>
> >>>>>       aSnapshotDevice.delete(new
> >>>>> URI("/snapshots/<account_id>/<snapshot_id>"))
> >>>>>
> >>>>> Finally, deleting a volume, the following snippet would delete a
> >>>>> volume from
> >>>>> VolumeService:
> >>>>>
> >>>>>       // Ask the hypervisor to delete the volume
> >>>>>
> >>>>>       aVolumeDevice.delete(new
> >>>>> URI("/volumes/<account_id>/<volume_id>"))
> >>>>>
> >>>>> In summary, I believe that the opaque operations specified in the
> >>>>> StorageDevice interface can accomplish these goals if the
> >>>>> following approaches are employed:
> >>>>>
> >>>>>       * Logical, reversible URIs are constructed by the storage 
> >>>>> services.
> >>>>> These URIs are translated by the StorageDevice implementation to
> >>>>> the semantics of the underlying device
> >>>>>       * The storage service methods break their logic down into a
> >>>>> series operations against one or more StorageDevices.  These
> >>>>> operations should conform to common Java idioms because
> >>> StorageDevice
> >>>>> is built on the standard Java I/O model (i.e. InputStream,
> >>>>> OutputStream,
> >>> URI).
> >>>>>
> >>>>> Thanks,
> >>>>> -John
> >>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> Based on this line of thought, I propose the addition of
> >>>>>>> following notions to the storage framework:
> >>>>>>>
> >>>>>>>   * StorageType (Enumeration)
> >>>>>>>      * BLOCK (raw block devices such as iSCSI, NBD, etc)
> >>>>>>>      * FILE_SYSTEM (devices addressable through the filesystem
> >>>>>>> such as local disks, NFS, etc)
> >>>>>>>      * OBJECT (object stores such as S3 and Swift)
> >>>>>>>   * StorageDevice (interface)
> >>>>>>>       * open(URI aDestinationURI): OutputStream throws
> IOException
> >>>>>>>       * write(URI aDestinationURI, OutputStream anOutputStream)
> >>>>>>> throws IOException
> >>>>>>>       * list(URI aDestinationURI) : Set<URI> throws IOException
> >>>>>>>       * delete(URI aDestinationURI) : boolean throws IOException
> >>>>>>>       * getType() : StorageType
> >>>>>>>   * UnsupportedStorageDevice (unchecked exception): Thrown
> when
> >>> an
> >>>>>>> unsuitable device type is provided to a storage service.
> >>>>>>>
> >>>>>>> All operations on the higher level storage services (e.g.
> >>>>>>> ImageService) would accept a StorageDevice parameter on their
> >>>>>>> operations.  Using the type property, services can determine
> >>>>>>> whether or not the passed device is an suitable (e.g. guarding
> >>>>>>> against the use object store such as S3 as VM disk) -- throwing
> >>>>>>> an UnsupportedStorageDevice exception when a device unsuitable
> >>>>>>> for
> >>> the
> >>>>>>> requested operation.  The services would then perform all
> >>>>>>> storage
> >>>>> operations through the passed StorageDevice.
> >>>>>>>
> >>>>>>> One potential gap is security.  I do not know whether or not
> >>>>>>> authorization decisions are assumed to occur up the stack from
> >>>>>>> the storage engine or as part of it.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> -John
> >>>>>>>
> >>>>>>> P.S. I apologize for taking so long to push my feedback.  I am
> >>>>>>> just getting back on station from the birth of our second child.
> >>>>>>
> >>>>>>
> >>>>>> Congratulation! Thanks for your great feedback.
> >>>>>>
> >>>>>>>
> >>>>>>> On Dec 28, 2012, at 8:09 PM, Edison Su <edison...@citrix.com>
> wrote:
> >>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> -----Original Message-----
> >>>>>>>>> From: Marcus Sorensen [mailto:shadow...@gmail.com]
> >>>>>>>>> Sent: Friday, December 28, 2012 2:56 PM
> >>>>>>>>> To: cloudstack-dev@incubator.apache.org
> >>>>>>>>> Subject: Re: new storage framework update
> >>>>>>>>>
> >>>>>>>>> Thanks. I'm trying to picture how this will change the existing
> code.
> >>>>>>>>> I think it is something i will need a real example to understand.
> >>>>>>>>> Currently we pass a
> >>>>>>>> Yah, the example code is in these files:
> >>>>>>>> XenNfsConfigurator
> >>>>>>>> DefaultPrimaryDataStoreDriverImpl
> >>>>>>>> DefaultPrimaryDatastoreProviderImpl
> >>>>>>>> VolumeServiceImpl
> >>>>>>>> DefaultPrimaryDataStore
> >>>>>>>> XenServerStorageResource
> >>>>>>>>
> >>>>>>>> You can start from volumeServiceTest ->
> >> createVolumeFromTemplate
> >>>>>>>> test
> >>>>>>> case.
> >>>>>>>>
> >>>>>>>>> storageFilerTO and/or volumeTO from the serverto the agent,
> >>>>>>>>> and the agent
> >>>>>>>> These model is not changed, what changed are the commands
> send
> >>> to
> >>>>>>> resource. Right now, each storage protocol can send it's own
> >>>>>>> command to resource.
> >>>>>>>> All the storage related commands are put under
> >>>>>>> org.apache.cloudstack.storage.command package. Take
> >>>>>>> CopyTemplateToPrimaryStorageCmd as an example,
> >>>>>>>> It has a field called ImageOnPrimayDataStoreTO, which contains
> >>>>>>>> a
> >>>>>>> PrimaryDataStoreTO. PrimaryDataStoreTO  contains the basic
> >>>>>>> information about a primary storage. If needs to send extra
> >>>>>>> information to resource, one can subclass PrimaryDataStoreTO, e.g.
> >>>>>>> NfsPrimaryDataStoreTO, which contains nfs server ip, and nfs path.
> >>>>>>> In this way, one can write a CLVMPrimaryDataStoreTO, which
> >>>>>>> contains clvm's
> >>>>> own special information if
> >>>>>>> needed.   Different protocol uses different TO can simply the code,
> >> and
> >>>>>>> easier to add new storage.
> >>>>>>>>
> >>>>>>>>> does all of the work. Do we still need things like
> >>>>>>>>> LibvirtStorageAdaptor to do the work on the agent side of
> >>>>>>>>> actually managing the volumes/pools and implementing them,
> >>>>>>>>> connecting
> >>>>> them
> >>>>>>> to
> >>>>>>>>> vms? So in implementing new storage we will need to write both
> >>>>>>>>> a configurator and potentially a storage adaptor?
> >>>>>>>>
> >>>>>>>> Yes, that's minimal requirements.
> >>>>>>>>
> >>>>>>>>> On Dec 27, 2012 6:41 PM, "Edison Su" <edison...@citrix.com>
> >> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi All,
> >>>>>>>>>>    Before heading into holiday, I'd like to update the
> >>>>>>>>>> current status of the new storage framework since last collab12.
> >>>>>>>>>>   1. Class diagram of primary storage is evolved:
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/download/attachments/30741569/sto
> >>>>>>>>> r
> >>>>>>>>> age.jpg?version=1&modificationDate=1356640617613
> >>>>>>>>>>         Highlight the current design:
> >>>>>>>>>>         a.  One storage provider can cover multiple storage
> >>>>>>>>>> protocols for multiple hypervisors. The default storage
> >>>>>>>>>> provider can almost cover all the current primary storage
> >>>>>>>>>> protocols. In most of cases, you don't need to write a new
> >>>>>>>>>> storage provider, what you need to do is to write a new
> >>>>>>>>>> storage
> >> configurator.
> >>>>>>>>>> Write a new storage provider needs to write a lot of code,
> >>>>>>>>>> which we should avoid it as much as
> >>>>>>>>> possible.
> >>>>>>>>>>        b. A new type hierarchy, primaryDataStoreConfigurator,
> >>>>>>>>>> is
> >>> added.
> >>>>>>>>>> The configurator is a factory for primaryDataStore, which
> >>>>>>>>>> assemble StorageProtocolTransformer,
> >> PrimaryDataStoreLifeCycle
> >>>>>>>>>> and PrimaryDataStoreDriver for PrimaryDataStore object,
> based
> >>>>>>>>>> on the hypervisor type and the storage protocol.  For
> >>>>>>>>>> example, for nfs primary storage on xenserver, there is a
> >>>>>>>>>> class called XenNfsConfigurator, which put
> >>>>>>>>>> DefaultXenPrimaryDataStoreLifeCycle,
> >>>>>>>>>> NfsProtocolTransformer and
> DefaultPrimaryDataStoreDriverImpl
> >>>>>>>>>> into DefaultPrimaryDataStore. One provider can only have one
> >>>>>>>>>> configurator for a pair of hypervisor type and storage protocol.
> >>>>>>>>>> For example, if you want to add a new nfs protocol
> >>>>>>>>>> configurator for xenserver hypervisor, you need to write a
> >>>>>>>>>> new
> >> storage provider.
> >>>>>>>>>>       c. A new interface, StorageProtocolTransformer, is added.
> >>>>>>>>>> The main purpose of this interface is to handle the
> >>>>>>>>>> difference between different storage protocols. It has four
> methods:
> >>>>>>>>>>            getInputParamNames: return a list of name of
> >>>>>>>>>> parameters for a particular protocol. E.g. NFS protocol has
> >>>>>>>>>> ["server", "path"], ISCSI has ["iqn", "lun"] etc. UI
> >>>>>>>>>> shouldn't hardcode these parameters any
> >>>>>>>>> more.
> >>>>>>>>>>            normalizeUserInput: given a user input from
> >>>>>>>>>> UI/API, need to validate the input, and break it apart, then
> >>>>>>>>>> store them into
> >>>>> database
> >>>>>>>>>>            getDataStoreTO/ getVolumeTO: each protocol can
> >>>>>>>>>> have its own volumeTO and primaryStorageTO. TO is the object
> >>>>>>>>>> will be passed down to resource, if your storage has extra
> >>>>>>>>>> information you want to pass to resource, these two methods
> >>>>>>>>>> are the place you can
> >>>>> override.
> >>>>>>>>>>       d. All the time-consuming API calls related to storage is 
> >>>>>>>>>> async.
> >>>>>>>>>>
> >>>>>>>>>>      2. Minimal functionalities are implemented:
> >>>>>>>>>>           a. Can register a http template, without SSVM
> >>>>>>>>>>           b. Can register a NFS primary storage for xenserver
> >>>>>>>>>>           c. Can download a template into primary storage directly
> >>>>>>>>>>          d. Can create a volume from a template
> >>>>>>>>>>
> >>>>>>>>>>      3. All about test:
> >>>>>>>>>>          a. TestNG test framework is used, as it can provide
> >>>>>>>>>> parameter for each test case. For integration test, we need
> >>>>>>>>>> to know ip address of hypervisor host, the host uuid(if it's
> >>>>>>>>>> xenserver), the primary storage url, the template url etc.
> >>>>>>>>>> These configurations are better to be parameterized, so for
> >>>>>>>>>> each test run, we don't need to modify test case itself,
> >>>>>>>>>> instead, we provide a test configuration file for each test
> >>>>>>>>>> run. TestNG framework already has this functionality, I just
> >>>>>>>>> reuse it.
> >>>>>>>>>>          b. Every pieces of code can be unit tested, which means:
> >>>>>>>>>>                b.1 the xcp plugin can be unit tested. I wrote
> >>>>>>>>>> a small python code, called mockxcpplugin.py, which can
> >>>>>>>>>> directly call xcp
> >>>>>>>>> plugin.
> >>>>>>>>>>                b.2 direct agent hypervisor resource can be tested.
> >>>>>>>>>> I wrote a mock agent manger, which can load and initialize
> >>>>>>>>>> hypervisor resource, and also can send command to resource.
> >>>>>>>>>>                b.3 a storage integration test maven project
> >>>>>>>>>> is created, which can test the whole storage subsystem, such
> >>>>>>>>>> as create volume from template, which including both image
> >>>>>>>>>> and volume
> >>>>>>>>> components.
> >>>>>>>>>>          A new section, called "how to test", is added into
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Storage+subsys
> >>>>>>>>> t
> >>>>>>>>>> em+2.0,
> >>>>>>>>>> please check it out.
> >>>>>>>>>>
> >>>>>>>>>>     The code is on the javelin branch, the maven projects
> >>>>>>>>>> whose name starting from cloud-engine-storage-* are the
> code
> >>>>>>>>>> related to storage subsystem. Most of the primary storage
> >>>>>>>>>> code is in cloud-engine-storage-volume project.
> >>>>>>>>>>      Any feedback/comment is appreciated.
> >>>>>>>>>>
> >>>>>>
> >>>>
> >

RE: new storage framework update

Reply via email to