I agree, John - let's get consensus first, then talk time tables.
On Tue, Aug 20, 2013 at 4:31 PM, John Burwell <jburw...@basho.com> wrote: > Mike, > > Before we can dig into timelines or implementations, I think we need to > get consensus on the problem to solved and the goals. Once we have a > proper understanding of the scope, I believe we can chunk the across a set > of development lifecycle. The subject is vast, but it also has a far > reaching impact to both the storage and network layer evolution efforts. > As such, I believe we need to start addressing it as part of the next > release. > > As a separate thread, we need to discuss the timeline for the next > release. I think we need to avoid the time compression caused by the > overlap of the 4.1 stabilization effort and 4.2 development. Therefore, I > don't think we should consider development of the next release started > until the first 4.2 RC is released. I will try to open a separate discuss > thread for this topic, as well as, tying of the discussion of release code > names. > > Thanks, > -John > > On Aug 20, 2013, at 6:22 PM, Mike Tutkowski <mike.tutkow...@solidfire.com> > wrote: > > > Hey John, > > > > I think this is some great stuff. Thanks for the write up. > > > > It looks like you have ideas around what might go into a first release of > > this plug-in framework. Were you thinking we'd have enough time to > squeeze > > that first rev into 4.3. I'm just wondering (it's not a huge deal to hit > > that release for this) because we would only have about five weeks. > > > > Thanks > > > > > > On Tue, Aug 20, 2013 at 3:43 PM, John Burwell <jburw...@basho.com> > wrote: > > > >> All, > >> > >> In capturing my thoughts on storage, my thinking backed into the driver > >> model. While we have the beginnings of such a model today, I see the > >> following deficiencies: > >> > >> > >> 1. *Multiple Models*: The Storage, Hypervisor, and Security layers > >> each have a slightly different model for allowing system > functionality to > >> be extended/substituted. These differences increase the barrier of > entry > >> for vendors seeking to extend CloudStack and accrete code paths to be > >> maintained and verified. > >> 2. *Leaky Abstraction*: Plugins are registered through a Spring > >> configuration file. In addition to being operator unfriendly (most > >> sysadmins are not Spring experts nor do they want to be), we expose > the > >> core bootstrapping mechanism to operators. Therefore, a > misconfiguration > >> could negatively impact the injection/configuration of internal > management > >> server components. Essentially handing them a loaded shotgun pointed > at > >> our right foot. > >> 3. *Nondeterministic Load/Unload Model*: Because the core loading > >> mechanism is Spring, the management has little control over the > timing and > >> order of component loading/unloading. Changes to the Management > Server's > >> component dependency graph could break a driver by causing it to be > started > >> at an unexpected time. > >> 4. *Lack of Execution Isolation*: As a Spring component, plugins are > >> loaded into the same execution context as core management server > >> components. Therefore, an errant plugin can corrupt the entire > management > >> server. > >> > >> > >> For next revision of the plugin/driver mechanism, I would like see us > >> migrate towards a standard pluggable driver model that supports all of > the > >> management server's extension points (e.g. network devices, storage > >> devices, hypervisors, etc) with the following capabilities: > >> > >> > >> - *Consolidated Lifecycle and Startup Procedure*: Drivers share a > >> common state machine and categorization (e.g. network, storage, > hypervisor, > >> etc) that permits the deterministic calculation of initialization and > >> destruction order (i.e. network layer drivers -> storage layer > drivers -> > >> hypervisor drivers). Plugin inter-dependencies would be supported > between > >> plugins sharing the same category. > >> - *In-process Installation and Upgrade*: Adding or upgrading a driver > >> does not require the management server to be restarted. This > capability > >> implies a system that supports the simultaneous execution of multiple > >> driver versions and the ability to suspend continued execution work > on a > >> resource while the underlying driver instance is replaced. > >> - *Execution Isolation*: The deployment packaging and execution > >> environment supports different (and potentially conflicting) versions > of > >> dependencies to be simultaneously used. Additionally, plugins would > be > >> sufficiently sandboxed to protect the management server against driver > >> instability. > >> - *Extension Data Model*: Drivers provide a property bag with a > >> metadata descriptor to validate and render vendor specific data. The > >> contents of this property bag will provided to every driver operation > >> invocation at runtime. The metadata descriptor would be a lightweight > >> description that provides a label resource key, a description > resource key, > >> data type (string, date, number, boolean), required flag, and optional > >> length limit. > >> - *Introspection: Administrative APIs/UIs allow operators to > >> understand the configuration of the drivers in the system, their > >> configuration, and their current state.* > >> - *Discoverability*: Optionally, drivers can be discovered via a > >> project repository definition (similar to Yum) allowing drivers to be > >> remotely acquired and operators to be notified regarding update > >> availability. The project would also provide, free of charge, > certificates > >> to sign plugins. This mechanism would support local mirroring to > support > >> air gapped management networks. > >> > >> > >> Fundamentally, I do not want to turn CloudStack into an erector set with > >> more screws than nuts which is a risk with highly pluggable > architectures. > >> As such, I think we would need to tightly bound the scope of drivers and > >> their behaviors to prevent the loss system usability and stability. My > >> thinking is that drivers would be packaged into a custom JAR, CAR > >> (CloudStack ARchive), that would be structured as followed: > >> > >> > >> - META-INF > >> - MANIFEST.MF > >> - driver.yaml (driver metadata(e.g. version, name, description, > >> etc) serialized in YAML format) > >> - LICENSE (a text file containing the driver's license) > >> - lib (driver dependencies) > >> - classes (driver implementation) > >> - resources (driver message files and potentially JS resources) > >> > >> > >> The management server would acquire drivers through a simple scan of a > URL > >> (e.g. file directory, S3 bucket, etc). For every CAR object found, the > >> management server would create an execution environment (likely a > dedicated > >> ExecutorService and Classloader), and transition the state of the > driver to > >> Running (the exact state model would need to be worked out). To be > really > >> nice, we could develop a custom Ant task/Maven plugin/Gradle plugin to > >> create CARs. I can also imagine an opportunities to add hooks to this > >> model to register instrumentation information with JMX and > authorization. > >> > >> To keep the scope of this email confined, we would introduce the general > >> notion of a Resource, and (hand wave hand wave) eventually > compartmentalize > >> the execution of work around a resource [1]. This (hand waved) > >> compartmentalization would allow us the controls necessary to safely and > >> reliably perform in-place driver upgrades. For an initial release, I > would > >> recommend implementing the abstractions, loading mechanism, extension > data > >> model, and discovery features. With these capabilities in place, we > could > >> attack the in-place upgrade model. > >> > >> If we were to adopt such a pluggable capability, we would have the > >> opportunity to decouple the vendor and CloudStack release schedules. > For > >> example, if a vendor were introducing a new product that required a new > or > >> updated driver, they would no longer need to wait for a CloudStack > release > >> to support it. They would also gain the ability to fix high priority > >> defects in the same manner. > >> > >> I have hand waved a number of issues that would need to be resolved > before > >> such an approach could be implemented. However, I think we need to > decide, > >> as a community, that it worth devoting energy and effort to enhancing > the > >> plugin/driver model and the goals of that effort before driving head > first > >> into the deep rabbit hole of design/implementation. > >> > >> Thoughts? (/me ducks) > >> -John > >> > >> [1]: My opinions on the matter from CloudStack Collab 2013 -> > >> > http://www.slideshare.net/JohnBurwell1/how-to-run-from-a-zombie-cloud-stack-distributed-process-management > >> > > > > > > > > -- > > *Mike Tutkowski* > > *Senior CloudStack Developer, SolidFire Inc.* > > e: mike.tutkow...@solidfire.com > > o: 303.746.7302 > > Advancing the way the world uses the > > cloud<http://solidfire.com/solution/overview/?video=play> > > *™* > > -- *Mike Tutkowski* *Senior CloudStack Developer, SolidFire Inc.* e: mike.tutkow...@solidfire.com o: 303.746.7302 Advancing the way the world uses the cloud<http://solidfire.com/solution/overview/?video=play> *™*