Hi John,

This is really good thinking on the future of cloudstack driver/plugin
architecture.

I would be very happy if we can decouple the ACS releases from vendor
specific releases.

I also agree on the research & experiments (POCs, tools) that needs to be
undertaken to conclude if decoupled driver, driver upgrade, coexistence of
multiple drivers, hot driver deployment, etc. can actually work in JVM.

Note - This approach reminds me of the Vert.x modules & containers that
tries to manage this problem along with other concerns (read
instrumentation, etc.)

Regards,
Amit
*CloudByte Inc.* <http://www.cloudbyte.com/>


On Wed, Aug 21, 2013 at 6:22 AM, Darren Shepherd <
darren.s.sheph...@gmail.com> wrote:

> Sure, I fully understand how it theoretically works, but I'm saying from a
> practical perspective it always seems to fall apart.  What your describing
> is done excellently in OSGI 4.2 Blueprint.  It's a beautiful framework that
> allows you to expose services that can be dynamically updated at runtime.
>
> The issues always happens with unloading.  I'll give you a real world
> example.  As part of the servlet spec your supposed to be able to stop and
> unload wars.  But in practice if you do it enough times you typically run
> out of memory.  So one such issue was with commons logging (since fixed).
>  When you do getLogger(myclass.class) it would cache a reference of the
> Class object to the actual log impl.  The commons logging jar is typically
> loaded with a system classloader and but MyClass.class would be loaded in
> the webapp classloader.  So when you stop the war there is a reference
> chain system classloader -> logfactory -> Myclass -> webapp classloader.
>  So the web app never gets GC'd.
>
> So just pointing out the practical issues, that's it.
>
> Darren
>
> On Aug 20, 2013, at 5:31 PM, John Burwell <jburw...@basho.com> wrote:
>
> > Darren,
> >
> > Actually, loading and unloading aren't difficult if resource management
> and drivers work within the following constraints/assumptions:
> >
> > Drivers are transient and stateless
> > A driver instance is assigned per resource managed (i.e. no singletons)
> > A lightweight thread and mailbox (i.e. actor model) are assigned per
> resource managed (outlined in the presentation referenced below)
> >
> > Based on these constraints and assumptions, the following upgrade
> process could be implemented:
> >
> > Load and verify new driver version to make it available
> > Notify the supervisor processes of each affected resource that a new
> driver is available
> > Upon completion of the current message being processed by its associated
> actor, the supervisor kills and respawns the actor managing its associated
> resource
> > As part of startup, the supervisor injects an instance of the new driver
> version and the actor resumes processing messages in its mailbox
> >
> > This process mirrors the process that would occur on management server
> startup for each resource minus killing an existing actor instance.
>  Eventually, the system will upgrade the driver without loss of operation.
>  More sophisticated policies could be added, but I think this approach
> would be a solid default upgrade behavior.  As a bonus, this same approach
> could also be applied to global configuration settings -- allowing the
> system to apply changes to these values without restarting the system.
> >
> > In summary, CloudStack and Eclipse are very different types of systems.
>  Eclipse is a desktop application implementing complex workflows, user
> interactions, and management of shared state (e.g. project structure, AST,
> compiler status, etc).  In contrast, CloudStack is an eventually consistent
> distributed system performing automation control.  As such, its
> requirements plugin requirements are not only very different, but IMHO,
> much simpler.
> >
> > Thanks,
> > -John
> >
> > On Aug 20, 2013, at 7:44 PM, Darren Shepherd <
> darren.s.sheph...@gmail.com> wrote:
> >
> >> I know this isn't terribly useful, but I've been drawing a lot of
> squares and circles and lines that connect those squares and circles lately
> and I have a lot of architectural ideas for CloudStack.  At the rate I'm
> going it will take me about two weeks to put together a discussion/proposal
> for the community.  What I'm thinking is a superset of what you've listed
> out and should align with your idea of a CAR.  The focus has a a lot to do
> with modularity and extensibility.
> >>
> >> So more to come soon....  I will say one thing though, is with java you
> end up having a hard time doing dynamic load and unloading of modules.
>  There's plenty of frameworks that try really hard to do this right, like
> OSGI, but its darn near impossible to do it right because of class loading
> and GC issues (and that's why Eclipse has you restart after installing
> plugs even though it is OSGi).
> >>
> >> I do believe that CloudStack should be possible of zero downtime
> maintenance and have ideas around that, but at the end of the day, for
> plenty of practical reasons, you still need a JVM restart if modules change.
> >>
> >> Darren
> >>
> >> On Aug 20, 2013, at 3:39 PM, Mike Tutkowski <
> mike.tutkow...@solidfire.com> wrote:
> >>
> >>> I agree, John - let's get consensus first, then talk time tables.
> >>>
> >>>
> >>> On Tue, Aug 20, 2013 at 4:31 PM, John Burwell <jburw...@basho.com>
> wrote:
> >>>
> >>>> Mike,
> >>>>
> >>>> Before we can dig into timelines or implementations, I think we need
> to
> >>>> get consensus on the problem to solved and the goals.  Once we have a
> >>>> proper understanding of the scope, I believe we can chunk the across
> a set
> >>>> of development lifecycle.  The subject is vast, but it also has a far
> >>>> reaching impact to both the storage and network layer evolution
> efforts.
> >>>> As such, I believe we need to start addressing it as part of the next
> >>>> release.
> >>>>
> >>>> As a separate thread, we need to discuss the timeline for the next
> >>>> release.  I think we need to avoid the time compression caused by the
> >>>> overlap of the 4.1 stabilization effort and 4.2 development.
>  Therefore, I
> >>>> don't think we should consider development of the next release started
> >>>> until the first 4.2 RC is released.  I will try to open a separate
> discuss
> >>>> thread for this topic, as well as, tying of the discussion of release
> code
> >>>> names.
> >>>>
> >>>> Thanks,
> >>>> -John
> >>>>
> >>>> On Aug 20, 2013, at 6:22 PM, Mike Tutkowski <
> mike.tutkow...@solidfire.com>
> >>>> wrote:
> >>>>
> >>>>> Hey John,
> >>>>>
> >>>>> I think this is some great stuff. Thanks for the write up.
> >>>>>
> >>>>> It looks like you have ideas around what might go into a first
> release of
> >>>>> this plug-in framework. Were you thinking we'd have enough time to
> >>>> squeeze
> >>>>> that first rev into 4.3. I'm just wondering (it's not a huge deal to
> hit
> >>>>> that release for this) because we would only have about five weeks.
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>>
> >>>>> On Tue, Aug 20, 2013 at 3:43 PM, John Burwell <jburw...@basho.com>
> >>>> wrote:
> >>>>>
> >>>>>> All,
> >>>>>>
> >>>>>> In capturing my thoughts on storage, my thinking backed into the
> driver
> >>>>>> model.  While we have the beginnings of such a model today, I see
> the
> >>>>>> following deficiencies:
> >>>>>>
> >>>>>>
> >>>>>> 1. *Multiple Models*: The Storage, Hypervisor, and Security layers
> >>>>>> each have a slightly different model for allowing system
> >>>> functionality to
> >>>>>> be extended/substituted.  These differences increase the barrier of
> >>>> entry
> >>>>>> for vendors seeking to extend CloudStack and accrete code paths to
> be
> >>>>>> maintained and verified.
> >>>>>> 2. *Leaky Abstraction*:  Plugins are registered through a Spring
> >>>>>> configuration file.  In addition to being operator unfriendly (most
> >>>>>> sysadmins are not Spring experts nor do they want to be), we expose
> >>>> the
> >>>>>> core bootstrapping mechanism to operators.  Therefore, a
> >>>> misconfiguration
> >>>>>> could negatively impact the injection/configuration of internal
> >>>> management
> >>>>>> server components.  Essentially handing them a loaded shotgun
> pointed
> >>>> at
> >>>>>> our right foot.
> >>>>>> 3. *Nondeterministic Load/Unload Model*:  Because the core loading
> >>>>>> mechanism is Spring, the management has little control over the
> >>>> timing and
> >>>>>> order of component loading/unloading.  Changes to the Management
> >>>> Server's
> >>>>>> component dependency graph could break a driver by causing it to be
> >>>> started
> >>>>>> at an unexpected time.
> >>>>>> 4. *Lack of Execution Isolation*: As a Spring component, plugins are
> >>>>>> loaded into the same execution context as core management server
> >>>>>> components.  Therefore, an errant plugin can corrupt the entire
> >>>> management
> >>>>>> server.
> >>>>>>
> >>>>>>
> >>>>>> For next revision of the plugin/driver mechanism, I would like see
> us
> >>>>>> migrate towards a standard pluggable driver model that supports all
> of
> >>>> the
> >>>>>> management server's extension points (e.g. network devices, storage
> >>>>>> devices, hypervisors, etc) with the following capabilities:
> >>>>>>
> >>>>>>
> >>>>>> - *Consolidated Lifecycle and Startup Procedure*:  Drivers share a
> >>>>>> common state machine and categorization (e.g. network, storage,
> >>>> hypervisor,
> >>>>>> etc) that permits the deterministic calculation of initialization
> and
> >>>>>> destruction order (i.e. network layer drivers -> storage layer
> >>>> drivers ->
> >>>>>> hypervisor drivers).  Plugin inter-dependencies would be supported
> >>>> between
> >>>>>> plugins sharing the same category.
> >>>>>> - *In-process Installation and Upgrade*: Adding or upgrading a
> driver
> >>>>>> does not require the management server to be restarted.  This
> >>>> capability
> >>>>>> implies a system that supports the simultaneous execution of
> multiple
> >>>>>> driver versions and the ability to suspend continued execution work
> >>>> on a
> >>>>>> resource while the underlying driver instance is replaced.
> >>>>>> - *Execution Isolation*: The deployment packaging and execution
> >>>>>> environment supports different (and potentially conflicting)
> versions
> >>>> of
> >>>>>> dependencies to be simultaneously used.  Additionally, plugins would
> >>>> be
> >>>>>> sufficiently sandboxed to protect the management server against
> driver
> >>>>>> instability.
> >>>>>> - *Extension Data Model*: Drivers provide a property bag with a
> >>>>>> metadata descriptor to validate and render vendor specific data.
>  The
> >>>>>> contents of this property bag will provided to every driver
> operation
> >>>>>> invocation at runtime.  The metadata descriptor would be a
> lightweight
> >>>>>> description that provides a label resource key, a description
> >>>> resource key,
> >>>>>> data type (string, date, number, boolean), required flag, and
> optional
> >>>>>> length limit.
> >>>>>> - *Introspection: Administrative APIs/UIs allow operators to
> >>>>>> understand the configuration of the drivers in the system, their
> >>>>>> configuration, and their current state.*
> >>>>>> - *Discoverability*: Optionally, drivers can be discovered via a
> >>>>>> project repository definition (similar to Yum) allowing drivers to
> be
> >>>>>> remotely acquired and operators to be notified regarding update
> >>>>>> availability.  The project would also provide, free of charge,
> >>>> certificates
> >>>>>> to sign plugins.  This mechanism would support local mirroring to
> >>>> support
> >>>>>> air gapped management networks.
> >>>>>>
> >>>>>>
> >>>>>> Fundamentally, I do not want to turn CloudStack into an erector set
> with
> >>>>>> more screws than nuts which is a risk with highly pluggable
> >>>> architectures.
> >>>>>> As such, I think we would need to tightly bound the scope of
> drivers and
> >>>>>> their behaviors to prevent the loss system usability and stability.
>  My
> >>>>>> thinking is that drivers would be packaged into a custom JAR, CAR
> >>>>>> (CloudStack ARchive), that would be structured as followed:
> >>>>>>
> >>>>>>
> >>>>>> - META-INF
> >>>>>>    - MANIFEST.MF
> >>>>>>    - driver.yaml (driver metadata(e.g. version, name, description,
> >>>>>>    etc) serialized in YAML format)
> >>>>>>    - LICENSE (a text file containing the driver's license)
> >>>>>> - lib (driver dependencies)
> >>>>>> - classes (driver implementation)
> >>>>>> - resources (driver message files and potentially JS resources)
> >>>>>>
> >>>>>>
> >>>>>> The management server would acquire drivers through a simple scan
> of a
> >>>> URL
> >>>>>> (e.g. file directory, S3 bucket, etc).  For every CAR object found,
> the
> >>>>>> management server would create an execution environment (likely a
> >>>> dedicated
> >>>>>> ExecutorService and Classloader), and transition the state of the
> >>>> driver to
> >>>>>> Running (the exact state model would need to be worked out).  To be
> >>>> really
> >>>>>> nice, we could develop a custom Ant task/Maven plugin/Gradle plugin
> to
> >>>>>> create CARs.   I can also imagine an opportunities to add hooks to
> this
> >>>>>> model to register instrumentation information with JMX and
> >>>> authorization.
> >>>>>>
> >>>>>> To keep the scope of this email confined, we would introduce the
> general
> >>>>>> notion of a Resource, and (hand wave hand wave) eventually
> >>>> compartmentalize
> >>>>>> the execution of work around a resource [1].  This (hand waved)
> >>>>>> compartmentalization would allow us the controls necessary to
> safely and
> >>>>>> reliably perform in-place driver upgrades.  For an initial release,
> I
> >>>> would
> >>>>>> recommend implementing the abstractions, loading mechanism,
> extension
> >>>> data
> >>>>>> model, and discovery features.  With these capabilities in place, we
> >>>> could
> >>>>>> attack the in-place upgrade model.
> >>>>>>
> >>>>>> If we were to adopt such a pluggable capability, we would have the
> >>>>>> opportunity to decouple the vendor and CloudStack release schedules.
> >>>> For
> >>>>>> example, if a vendor were introducing a new product that required a
> new
> >>>> or
> >>>>>> updated driver, they would no longer need to wait for a CloudStack
> >>>> release
> >>>>>> to support it.  They would also gain the ability to fix high
> priority
> >>>>>> defects in the same manner.
> >>>>>>
> >>>>>> I have hand waved a number of issues that would need to be resolved
> >>>> before
> >>>>>> such an approach could be implemented.  However, I think we need to
> >>>> decide,
> >>>>>> as a community, that it worth devoting energy and effort to
> enhancing
> >>>> the
> >>>>>> plugin/driver model and the goals of that effort before driving head
> >>>> first
> >>>>>> into the deep rabbit hole of design/implementation.
> >>>>>>
> >>>>>> Thoughts? (/me ducks)
> >>>>>> -John
> >>>>>>
> >>>>>> [1]: My opinions on the matter from CloudStack Collab 2013 ->
> >>>>
> http://www.slideshare.net/JohnBurwell1/how-to-run-from-a-zombie-cloud-stack-distributed-process-management
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> *Mike Tutkowski*
> >>>>> *Senior CloudStack Developer, SolidFire Inc.*
> >>>>> e: mike.tutkow...@solidfire.com
> >>>>> o: 303.746.7302
> >>>>> Advancing the way the world uses the
> >>>>> cloud<http://solidfire.com/solution/overview/?video=play>
> >>>>> *™*
> >>>
> >>>
> >>> --
> >>> *Mike Tutkowski*
> >>> *Senior CloudStack Developer, SolidFire Inc.*
> >>> e: mike.tutkow...@solidfire.com
> >>> o: 303.746.7302
> >>> Advancing the way the world uses the
> >>> cloud<http://solidfire.com/solution/overview/?video=play>
> >>> *™*
> >
>

Reply via email to