Re: [openstack-dev] [Heat] How the autoscale API should control scaling in Heat

Joshua Harlow Wed, 11 Sep 2013 01:12:26 -0700

+1

The assertions are not just applicable to autoscaling but to software in 
general. I hope we can make autoscaling "just enough" simple to work.


The circular heat<=>trove example is one of those that does worry me a little. 
It feels like something is not structured right if that it is needed (rube 
goldberg like). I am not sure what could be done differently, just my gut 
feeling that something is "off".

Sent from my really tiny device...

On Sep 10, 2013, at 8:55 PM, "Adrian Otto" <adrian.o...@rackspace.com> wrote:

> I have a different point of view. First I will offer some assertions:
> 
> A-1) We need to keep it simple.
>    A-1.1) Systems that are hard to comprehend are hard to debug, and that's 
> bad.
>    A-1.2) Complex systems tend to be much more brittle than simple ones.
> 
> A-2) Scale-up operations need to be as-fast-as-possible. 
>    A-2.1) Auto-Scaling only works right if your new capacity is added quickly 
> when your controller detects that you need more. If you spend a bunch of time 
> goofing around before actually adding a new resource to a pool when its under 
> staring. 
>    A-2.2) The fewer network round trips between "add-more-resources-now" and 
> "resources-added" the better. Fewer = less brittle.
> 
> A-3) The control logic for scaling different applications vary. 
>    A-3.1) What metrics are watched may differ between various use cases. 
>    A-3.2) The data types that represent sensor data may vary.
>    A-3.3) The policy that's applied to the metrics (such as max, min, and 
> cooldown period) vary between applications. Not only the values vary, but the 
> logic itself.
>    A-3.4) A scaling policy may not just be a handful of simple parameters. 
> Ideally it allows configurable logic that the end-user can control to some 
> extent.
> 
> A-4) Auto-scale operations are usually not orchestrations. They are usually 
> simple linear workflows.
>    A-4.1) The Taskflow project[1] offers a simple way to do workflows and 
> stable state management that can be integrated directly into Autoscale.
>    A-4.2) A task flow (workflow) can trigger a Heat orchestration if needed.
> 
> Now a mental tool to think about control policies:
> 
> Auto-scaling is like steering a car. The control policy says that you want to 
> drive equally between the two lane lines, and that if you drift off center, 
> you gradually correct back toward center again. If the road bends, you try to 
> remain in your lane as the lane lines curve. You try not to weave around in 
> your lane, and you try not to drift out of the lane.
> 
> If your controller notices that you are about to drift out of your lane 
> because the road is starting to bend, and you are distracted, or your hands 
> slip off the wheel, you might drift out of your lane into nearby traffic. 
> That's why you don't want a Rube Goldberg Machine[2] between you and the 
> steering wheel. See assertions A-1 and A-2.
> 
> If you are driving an 18-wheel tractor/trailer truck, steering is different 
> than if you are driving a Fiat. You need to wait longer and steer toward the 
> outside of curves so your trailer does not lag behind on the inside of the 
> curve behind you as you correct for a bend in the road. When you are driving 
> the Fiat, you may want to aim for the middle of the lane at all times, 
> possibly even apexing bends to reduce your driving distance, which is 
> actually the opposite of what truck drivers need to do. Control policies 
> apply to other parts of driving too. I want a different policy for braking 
> than I use for steering. On some vehicles I go through a gear shifting 
> workflow, and on others I don't. See assertion A-3.
> 
> So, I don't intend to argue the technical minutia of each design point, but I 
> challenge you to make sure that we (1) arrive at a simple system that any 
> OpenStack user can comprehend, (2) responds quickly to alarm stimulus, (3) is 
> unlikely to fail, (4) can be easily customized with user-supplied logic that 
> controls how the scaling happens, and under what conditions.
> 
> It would be better if we could explain Autoscale like this:
> 
> Heat -> Autoscale -> Nova, etc.
> -or-
> User -> Autoscale -> Nova, etc.
> 
> This approach allows use cases where (for whatever reason) the end user does 
> not want to use Heat at all, but still wants something simple to be 
> auto-scaled for them. Nobody would be scratching their heads wondering why 
> things are going in circles.
> 
> From an implementation perspective, that means the auto-scale service needs 
> at least a simple linear workflow capability in it that may trigger a Heat 
> orchestration if there is a good reason for it. This way, the typical use 
> cases don't have anything resembling circular dependencies. The source of 
> truth for how many members are currently in an Autoscaling group should be 
> the Autoscale service, not in the Heat database. If you want to expose that 
> in list-stack-resources output, then cause Heat to call out to the Autoscale 
> service to fetch that figure as needed. It is irrelevant to orchestration. 
> Code does not need to be duplicated. Both Autoscale and Heat can use the same 
> exact source code files for the code that launches/terminates instances of 
> resources.
> 
> References:
> [1] https://wiki.openstack.org/wiki/TaskFlow
> [2] http://en.wikipedia.org/wiki/Rube_Goldberg_machine
> 
> Thanks,
> 
> Adrian
> 
> 
> On Aug 16, 2013, at 11:36 AM, Zane Bitter <zbit...@redhat.com> wrote:
> 
>> On 16/08/13 00:50, Christopher Armstrong wrote:
>>> *Introduction and Requirements*
>>> 
>>> So there's kind of a perfect storm happening around autoscaling in Heat
>>> right now. It's making it really hard to figure out how I should compose
>>> this email. There are a lot of different requirements, a lot of
>>> different cool ideas, and a lot of projects that want to take advantage
>>> of autoscaling in one way or another: Trove, OpenShift, TripleO, just to
>>> name a few...
>>> 
>>> I'll try to list the requirements from various people/projects that may
>>> be relevant to autoscaling or scaling in general.
>>> 
>>> 1. Some users want a service like Amazon's Auto Scaling or Rackspace's
>>> Otter -- a simple API that doesn't really involve orchestration.
>>> 2. If such a API exists, it makes sense for Heat to take advantage of
>>> its functionality instead of reimplementing it.
>> 
>> +1, obviously. But the other half of the story is that the API is likely be 
>> implemented using Heat on the back end, amongst other reasons because that 
>> implementation already exists. (As you know, since you wrote it ;)
>> 
>> So, just as we will have an RDS resource in Heat that calls Trove, and Trove 
>> will use Heat for orchestration:
>> 
>> user => [Heat =>] Trove => Heat => Nova
>> 
>> there will be a similar workflow for Autoscaling:
>> 
>> user => [Heat =>] Autoscaling -> Heat => Nova
>> 
>> where the first, optional, Heat stack contains the RDS/Autoscaling resource 
>> and the backend Heat stack contains the actual Nova instance(s).
>> 
>> One difference might be that the Autoscaling -> Heat step need not happen 
>> via the public ReST API. Since both are part of the Heat project, I think it 
>> would also be OK to do this over RPC only.
>> 
>>> 3. If Heat integrates with that separate API, however, that API will
>>> need two ways to do its work:
>> 
>> Wut?
>> 
>>>   1. native instance-launching functionality, for the "simple" use
>> 
>> This is just the simplest possible case of 3.2. Why would we maintain a 
>> completely different implementation?
>> 
>>>   2. a way to talk back to Heat to perform orchestration-aware scaling
>>> operations.
>> 
>> [IRC discussions clarified this to mean scaling arbitrary resource types, 
>> rather than just Nova servers.]
>> 
>>> 4. There may be things that are different than AWS::EC2::Instance that
>>> we would want to scale (I have personally been playing around with the
>>> concept of a ResourceGroup, which would maintain a nested stack of
>>> resources based on an arbitrary template snippet).
>>> 5. Some people would like to be able to perform manual operations on an
>>> instance group -- such as Clint Byrum's recent example of "remove
>>> instance 4 from resource group A".
>>> 
>>> Please chime in with your additional requirements if you have any! Trove
>>> and TripleO people, I'm looking at you :-)
>>> 
>>> 
>>> *TL;DR*
>>> 
>>> Point 3.2. above is the main point of this email: exactly how should the
>>> autoscaling API talk back to Heat to tell it to add more instances? I
>>> included the other points so that we keep them in mind while considering
>>> a solution.
>>> 
>>> *Possible Solutions*
>>> 
>>> I have heard at least three possibilities so far:
>>> 
>>> 1. the autoscaling API should maintain a full template of all the nodes
>>> in the autoscaled nested stack, manipulate it locally when it wants to
>>> add or remove instances, and post an update-stack to the nested-stack
>>> associated with the InstanceGroup.
>> 
>> This is what I had been thinking.
>> 
>>> Pros: It doesn't require any changes to Heat.
>>> 
>>> Cons: It puts a lot of burden of state management on the autoscale API,
>> 
>> All other APIs need to manage state too, I don't really have a problem with 
>> that. It already has to handle e.g. the cooldown state; your scaling 
>> strategy (uh, for the service) will be determined by that.
>> 
>>> and it arguably spreads out the responsibility of "orchestration" to the
>>> autoscale API.
>> 
>> Another line of argument would be that this is not true by definition ;)
>> 
>>> Also arguable is that automated agents outside of Heat
>>> shouldn't be managing an "internal" template, which are typically
>>> developed by devops people and kept in version control.
>>> 
>>> 2. There should be a new custom-built API for doing exactly what the
>>> autoscaling service needs on an InstanceGroup, named something
>>> unashamedly specific -- like "instance-group-adjust".
>> 
>> +1 to having a custom (RPC-only) API if it means forcing some state out of 
>> the autoscaling service.
>> 
>> -1 for it talking to an InstanceGroup - that just brings back all our old 
>> problems about having "resources" that don't have their own separate state 
>> and APIs, but just exist inside of Heat plugins. Those are the cause of all 
>> of the biggest design problems in Heat. They're the thing I want the 
>> Autoscaling API to get rid of. (Also, see below.)
>> 
>>> Pros: It'll do exactly what it needs to do for this use case; very
>>> little state management in autoscale API; it lets Heat do all the
>>> orchestration and only give very specific delegation to the external
>>> autoscale API.
>>> 
>>> Cons: The API grows an additional method for a specific use case.
>>> 
>>> 3. the autoscaling API should update the "Size" Property of the
>>> InstanceGroup resource in the stack that it is placed in. This would
>>> require the ability to PATCH a specific piece of a template (an
>>> operation isomorphic to update-stack).
>> 
>> -1
>> 
>> Half the point of a separate autoscaling API is that an autoscaling group 
>> shouldn't _need_ to exist only in the context of a template.
>> 
>> So, to get around this I think you're proposing something where the two ways 
>> of interacting with Autoscaling are (with and without Heat):
>> 
>> user => Heat <=> Autoscaling
>>           \
>>            ------------------> Heat => Nova
>> 
>> user => Autoscaling => Heat -> Heat => Nova
>> 
>> Instead of:
>> 
>> user => Heat => Autoscaling => Heat => Nova
>> 
>> user =========> Autoscaling => Heat => Nova
>> 
>> 
>> This is two lots of code to write & test and, from the user's point of view, 
>> a bizarre and unprecedented conceptual inversion of where Orchestration 
>> belongs in the system. (Note that the backend Heat is an implementation 
>> detail hidden from the user in either case.)
>> 
>> cheers,
>> Zane.
>> 
>>> Pros: The API modification is generic, simply a more optimized version
>>> of update-stack; very little state management required in autoscale API.
>>> 
>>> Cons: This would essentially require manipulating the user-provided
>>> template.  (unless we have a concept of "private properties", which
>>> perhaps wouldn't appear in the template as provided by the user, but can
>>> be manipulated with such an update stack operation?)
>>> 
>>> 
>>> *Addenda*
>>> 
>>> Keep in mind that there are use cases which require other types of
>>> manipulation of the InstanceGroup -- not just the autoscaling API. For
>>> example, see Clint's #5 above.
>>> 
>>> 
>>> Also, about implementation: Andrew Plunk and I have begun work on Heat
>>> resources for Rackspace's Otter, which I think will be a really good
>>> proof of concept for how this stuff should work in the Heat-native
>>> autoscale API. I am trying to gradually work the design into the native
>>> Heat autoscaling design, and we will need to solve the
>>> autoscale-controlling-InstanceGroup issue soon.
>>> 
>>> --
>>> IRC: radix
>>> Christopher Armstrong
>>> Rackspace
>>> 
>>> 
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev@lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> 
>> 
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Heat] How the autoscale API should control scaling in Heat

Reply via email to