+1 The assertions are not just applicable to autoscaling but to software in general. I hope we can make autoscaling "just enough" simple to work.
The circular heat<=>trove example is one of those that does worry me a little. It feels like something is not structured right if that it is needed (rube goldberg like). I am not sure what could be done differently, just my gut feeling that something is "off". Sent from my really tiny device... On Sep 10, 2013, at 8:55 PM, "Adrian Otto" <adrian.o...@rackspace.com> wrote: > I have a different point of view. First I will offer some assertions: > > A-1) We need to keep it simple. > A-1.1) Systems that are hard to comprehend are hard to debug, and that's > bad. > A-1.2) Complex systems tend to be much more brittle than simple ones. > > A-2) Scale-up operations need to be as-fast-as-possible. > A-2.1) Auto-Scaling only works right if your new capacity is added quickly > when your controller detects that you need more. If you spend a bunch of time > goofing around before actually adding a new resource to a pool when its under > staring. > A-2.2) The fewer network round trips between "add-more-resources-now" and > "resources-added" the better. Fewer = less brittle. > > A-3) The control logic for scaling different applications vary. > A-3.1) What metrics are watched may differ between various use cases. > A-3.2) The data types that represent sensor data may vary. > A-3.3) The policy that's applied to the metrics (such as max, min, and > cooldown period) vary between applications. Not only the values vary, but the > logic itself. > A-3.4) A scaling policy may not just be a handful of simple parameters. > Ideally it allows configurable logic that the end-user can control to some > extent. > > A-4) Auto-scale operations are usually not orchestrations. They are usually > simple linear workflows. > A-4.1) The Taskflow project[1] offers a simple way to do workflows and > stable state management that can be integrated directly into Autoscale. > A-4.2) A task flow (workflow) can trigger a Heat orchestration if needed. > > Now a mental tool to think about control policies: > > Auto-scaling is like steering a car. The control policy says that you want to > drive equally between the two lane lines, and that if you drift off center, > you gradually correct back toward center again. If the road bends, you try to > remain in your lane as the lane lines curve. You try not to weave around in > your lane, and you try not to drift out of the lane. > > If your controller notices that you are about to drift out of your lane > because the road is starting to bend, and you are distracted, or your hands > slip off the wheel, you might drift out of your lane into nearby traffic. > That's why you don't want a Rube Goldberg Machine[2] between you and the > steering wheel. See assertions A-1 and A-2. > > If you are driving an 18-wheel tractor/trailer truck, steering is different > than if you are driving a Fiat. You need to wait longer and steer toward the > outside of curves so your trailer does not lag behind on the inside of the > curve behind you as you correct for a bend in the road. When you are driving > the Fiat, you may want to aim for the middle of the lane at all times, > possibly even apexing bends to reduce your driving distance, which is > actually the opposite of what truck drivers need to do. Control policies > apply to other parts of driving too. I want a different policy for braking > than I use for steering. On some vehicles I go through a gear shifting > workflow, and on others I don't. See assertion A-3. > > So, I don't intend to argue the technical minutia of each design point, but I > challenge you to make sure that we (1) arrive at a simple system that any > OpenStack user can comprehend, (2) responds quickly to alarm stimulus, (3) is > unlikely to fail, (4) can be easily customized with user-supplied logic that > controls how the scaling happens, and under what conditions. > > It would be better if we could explain Autoscale like this: > > Heat -> Autoscale -> Nova, etc. > -or- > User -> Autoscale -> Nova, etc. > > This approach allows use cases where (for whatever reason) the end user does > not want to use Heat at all, but still wants something simple to be > auto-scaled for them. Nobody would be scratching their heads wondering why > things are going in circles. > > From an implementation perspective, that means the auto-scale service needs > at least a simple linear workflow capability in it that may trigger a Heat > orchestration if there is a good reason for it. This way, the typical use > cases don't have anything resembling circular dependencies. The source of > truth for how many members are currently in an Autoscaling group should be > the Autoscale service, not in the Heat database. If you want to expose that > in list-stack-resources output, then cause Heat to call out to the Autoscale > service to fetch that figure as needed. It is irrelevant to orchestration. > Code does not need to be duplicated. Both Autoscale and Heat can use the same > exact source code files for the code that launches/terminates instances of > resources. > > References: > [1] https://wiki.openstack.org/wiki/TaskFlow > [2] http://en.wikipedia.org/wiki/Rube_Goldberg_machine > > Thanks, > > Adrian > > > On Aug 16, 2013, at 11:36 AM, Zane Bitter <zbit...@redhat.com> wrote: > >> On 16/08/13 00:50, Christopher Armstrong wrote: >>> *Introduction and Requirements* >>> >>> So there's kind of a perfect storm happening around autoscaling in Heat >>> right now. It's making it really hard to figure out how I should compose >>> this email. There are a lot of different requirements, a lot of >>> different cool ideas, and a lot of projects that want to take advantage >>> of autoscaling in one way or another: Trove, OpenShift, TripleO, just to >>> name a few... >>> >>> I'll try to list the requirements from various people/projects that may >>> be relevant to autoscaling or scaling in general. >>> >>> 1. Some users want a service like Amazon's Auto Scaling or Rackspace's >>> Otter -- a simple API that doesn't really involve orchestration. >>> 2. If such a API exists, it makes sense for Heat to take advantage of >>> its functionality instead of reimplementing it. >> >> +1, obviously. But the other half of the story is that the API is likely be >> implemented using Heat on the back end, amongst other reasons because that >> implementation already exists. (As you know, since you wrote it ;) >> >> So, just as we will have an RDS resource in Heat that calls Trove, and Trove >> will use Heat for orchestration: >> >> user => [Heat =>] Trove => Heat => Nova >> >> there will be a similar workflow for Autoscaling: >> >> user => [Heat =>] Autoscaling -> Heat => Nova >> >> where the first, optional, Heat stack contains the RDS/Autoscaling resource >> and the backend Heat stack contains the actual Nova instance(s). >> >> One difference might be that the Autoscaling -> Heat step need not happen >> via the public ReST API. Since both are part of the Heat project, I think it >> would also be OK to do this over RPC only. >> >>> 3. If Heat integrates with that separate API, however, that API will >>> need two ways to do its work: >> >> Wut? >> >>> 1. native instance-launching functionality, for the "simple" use >> >> This is just the simplest possible case of 3.2. Why would we maintain a >> completely different implementation? >> >>> 2. a way to talk back to Heat to perform orchestration-aware scaling >>> operations. >> >> [IRC discussions clarified this to mean scaling arbitrary resource types, >> rather than just Nova servers.] >> >>> 4. There may be things that are different than AWS::EC2::Instance that >>> we would want to scale (I have personally been playing around with the >>> concept of a ResourceGroup, which would maintain a nested stack of >>> resources based on an arbitrary template snippet). >>> 5. Some people would like to be able to perform manual operations on an >>> instance group -- such as Clint Byrum's recent example of "remove >>> instance 4 from resource group A". >>> >>> Please chime in with your additional requirements if you have any! Trove >>> and TripleO people, I'm looking at you :-) >>> >>> >>> *TL;DR* >>> >>> Point 3.2. above is the main point of this email: exactly how should the >>> autoscaling API talk back to Heat to tell it to add more instances? I >>> included the other points so that we keep them in mind while considering >>> a solution. >>> >>> *Possible Solutions* >>> >>> I have heard at least three possibilities so far: >>> >>> 1. the autoscaling API should maintain a full template of all the nodes >>> in the autoscaled nested stack, manipulate it locally when it wants to >>> add or remove instances, and post an update-stack to the nested-stack >>> associated with the InstanceGroup. >> >> This is what I had been thinking. >> >>> Pros: It doesn't require any changes to Heat. >>> >>> Cons: It puts a lot of burden of state management on the autoscale API, >> >> All other APIs need to manage state too, I don't really have a problem with >> that. It already has to handle e.g. the cooldown state; your scaling >> strategy (uh, for the service) will be determined by that. >> >>> and it arguably spreads out the responsibility of "orchestration" to the >>> autoscale API. >> >> Another line of argument would be that this is not true by definition ;) >> >>> Also arguable is that automated agents outside of Heat >>> shouldn't be managing an "internal" template, which are typically >>> developed by devops people and kept in version control. >>> >>> 2. There should be a new custom-built API for doing exactly what the >>> autoscaling service needs on an InstanceGroup, named something >>> unashamedly specific -- like "instance-group-adjust". >> >> +1 to having a custom (RPC-only) API if it means forcing some state out of >> the autoscaling service. >> >> -1 for it talking to an InstanceGroup - that just brings back all our old >> problems about having "resources" that don't have their own separate state >> and APIs, but just exist inside of Heat plugins. Those are the cause of all >> of the biggest design problems in Heat. They're the thing I want the >> Autoscaling API to get rid of. (Also, see below.) >> >>> Pros: It'll do exactly what it needs to do for this use case; very >>> little state management in autoscale API; it lets Heat do all the >>> orchestration and only give very specific delegation to the external >>> autoscale API. >>> >>> Cons: The API grows an additional method for a specific use case. >>> >>> 3. the autoscaling API should update the "Size" Property of the >>> InstanceGroup resource in the stack that it is placed in. This would >>> require the ability to PATCH a specific piece of a template (an >>> operation isomorphic to update-stack). >> >> -1 >> >> Half the point of a separate autoscaling API is that an autoscaling group >> shouldn't _need_ to exist only in the context of a template. >> >> So, to get around this I think you're proposing something where the two ways >> of interacting with Autoscaling are (with and without Heat): >> >> user => Heat <=> Autoscaling >> \ >> ------------------> Heat => Nova >> >> user => Autoscaling => Heat -> Heat => Nova >> >> Instead of: >> >> user => Heat => Autoscaling => Heat => Nova >> >> user =========> Autoscaling => Heat => Nova >> >> >> This is two lots of code to write & test and, from the user's point of view, >> a bizarre and unprecedented conceptual inversion of where Orchestration >> belongs in the system. (Note that the backend Heat is an implementation >> detail hidden from the user in either case.) >> >> cheers, >> Zane. >> >>> Pros: The API modification is generic, simply a more optimized version >>> of update-stack; very little state management required in autoscale API. >>> >>> Cons: This would essentially require manipulating the user-provided >>> template. (unless we have a concept of "private properties", which >>> perhaps wouldn't appear in the template as provided by the user, but can >>> be manipulated with such an update stack operation?) >>> >>> >>> *Addenda* >>> >>> Keep in mind that there are use cases which require other types of >>> manipulation of the InstanceGroup -- not just the autoscaling API. For >>> example, see Clint's #5 above. >>> >>> >>> Also, about implementation: Andrew Plunk and I have begun work on Heat >>> resources for Rackspace's Otter, which I think will be a really good >>> proof of concept for how this stuff should work in the Heat-native >>> autoscale API. I am trying to gradually work the design into the native >>> Heat autoscaling design, and we will need to solve the >>> autoscale-controlling-InstanceGroup issue soon. >>> >>> -- >>> IRC: radix >>> Christopher Armstrong >>> Rackspace >>> >>> >>> _______________________________________________ >>> OpenStack-dev mailing list >>> OpenStack-dev@lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> >> _______________________________________________ >> OpenStack-dev mailing list >> OpenStack-dev@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev