[openstack-dev] [TripleO][Heat][Kolla][Magnum] The zen of Heat, containers, and the future of TripleO

Zane Bitter Mon, 21 Mar 2016 13:18:21 -0700

tl;dr Containers represent a massive, and also mandatory, opportunityfor TripleO. Lets start thinking about ways that we can take maximumadvantage to achieve the goals of the project.

Now that you have the tl;dr I'm going to start from the beginning, sosettle in and grab yourself a cup of coffee or other poison of your choice.

After working on developing Heat from the very beginning of the projectin early 2012 and debugging a bunch of TripleO deployments in the field,it is my considered opinion that Heat is a poor fit for the workloadsthat TripleO is currently asking of it. To illustrate why, I need toexplain what it is that Heat is really designed to do.

Here's a theoretical example of how I've always imagined Heat softwaredeployments would make Heat users' lives better. For simplicity, I'mjust going to model two software components, a user-facing service thatconnects to some back-end service:


  resources:
    backend_component:
      type: OS::Heat::SoftwareComponent
      properties:
        configs:
          - tool: script
            actions:
              - CREATE
              - UPDATE
            config: |
              PORT=$(get_backend_port || random_port)
              stop_backend
              start_backend $DEPLOY_VERSION $PORT $CONFIG
              addr="$(hostname):$(get_backend_port)"
              printf '%s' "$addr" >${heat_outputs_path}.host_and_port
          - tool: script
            actions:
              - DELETE
            config: |
               stop_backend
         inputs:
           - name: DEPLOY_VERSION
           - name: CONFIG
         outputs:
           - name: host_and_port

    frontend_component:
      type: OS::Heat::SoftwareComponent
      properties:
        configs:
          - tool: script
            actions:
              - CREATE
              - UPDATE
            config: |
              stop_frontend
              start_frontend $DEPLOY_VERSION $BACKEND_ADDR $CONFIG
          - tool: script
            actions:
              - DELETE
            config: |
              stop_frontend
        inputs:
          - name: DEPLOY_VERSION
          - name: BACKEND_ADDR
          - name: CONFIG

    backend:
      type: OS::Heat::SoftwareDeployment
      properties:
        server: {get_resource: backend_server}
        name: {get_param: backend_version} # Forces upgrade replacement
        actions: [CREATE, UPDATE, DELETE]
        config: {get_resource: backend_component}
        input_values:
          DEPLOY_VERSION: ${get_param: backend_version}
          CONFIG: ${get_param: backend_config}

    frontend:
      type: OS::Heat::SoftwareDeployment
      properties:
        server: {get_resource: frontend_server}
        name: {get_param: frontend_version} # Forces upgrade replacement
        actions: [CREATE, UPDATE, DELETE]
        config: {get_resource: frontend_component}
        input_values:
          DEPLOY_VERSION: ${get_param: frontend_version}
          BACKEND_ADDR: {get_attr: [backend, host_and_port]}
          CONFIG: ${get_param: frontend_config}


This is actually quite a beautiful system, if I may say so:

- Whenever a version changes, Heat knows to update that component, andthe components can be updated independently.- If the backend in this example restarts on a different port, thefrontend is updated to point to the new port.- Everything is completely agnostic as to which server it is running on.They could be running on the same server or different servers.- Everything is integrated with the infrastructure (not only the serversyou're deploying on and the networks and volumes connected to them, butalso things like load balancers), so everything is created at the righttime, in parallel where possible, and any errors are reported all in oneplace.- If something requires e.g. a restart after changing another component,we can encode that. And if it doesn't, we can encode that too.- There's next to no downtime required: if e.g. we upgrade the backend,we first deploy a new one listening on a new port, then update thefrontend to listen on the new port, then finally shut down the oldbackend. Again, we can choose when we want this and when we just want toupdate in place and reload.- The application doesn't even need to worry about versioning theprotocol that its two constituent parts communicate over: as long as thebackend_version and frontend_version that we pass are always compatible,only compatible versions of the two services ever talk to each other.- If anything at all fails at any point before, during or after thispart of the template, Heat can automatically roll everything back intothe exact same state as it was in before, without any outsideintervention. You can insert test deployments that check everything isworking and have them automatically roll back if it's not, all with nodowntime for users.

So you can use this to do something like a fancier version of blue-greendeployment,[1] where you're actually rolling out the (virtualised)hardware and infrastructure in a blue-green fashion along with thesoftware. Not only that, you can choose to replace your whole stack oronly parts of it. (Note: the way I had to encode this in the exampleabove, by changing the deployment name so that it forces a resourcereplacement, is a hack. We really need a feature to specify in asoftware config resource which inputs should result in a replacement onchange.)

It's worth noting that in practice you really, really want everythingdeployed in containers to make this process work consistently, eventhough *in theory* you could make this work (briefly) without them. Inparticular, rollback without containers is a dicey proposition. When wefirst started talking about implementing software deployments in Heat Ihalf-seriously suggested that maybe we should make containers the onlyallowed type of software deployment, and I kind of wonder now if Ishouldn't have pressed harder on that point.

In any event, unfortunately as everyone involved in TripleO knows, theway TripleO uses Heat looks nothing like this. It actually looks morelike this:


  resources:
    install_all_the_things_on_one_server_config:
      type: OS::Heat::SoftwareConfig
      properties:
        actions: [CREATE]
        config: {get_file: install_all_the_things_on_one_server.sh}

    update_all_the_things_on_one_server_config:
      type: OS::Heat::SoftwareConfig
      properties:
        actions: [UPDATE]
        config: {get_file: update_all_the_things_on_one_server.sh}
        inputs:
          - name: update_count

    ...

(Filling in the rest is left as an exercise to the reader. You're welcome.)

Not illustrated are the multiple sources of truth that we have: puppetmodules (packaged on the server), puppet manifests and hieradata(delivered via Heat), external package repositories. Heat is a dataflowlanguage but much of the data it should be operating on is actuallyhidden from it. That's going about as well as you might expect.

Due to the impossibility of ever rolling back a deployment like one ofthose, we just disable rollback for the overcloud templates, so ifthere's a failure we end up stuck in whatever intermediate state we werein when the script died. That can leave things in an state whererecovery is not automatic when 'earlier' deployments (like the packageupdate) end up depending on state set up by 'later' deployments (likethe post- scripts, which manipulate Pacemaker's state in Pacemaker-baseddeployments). Even worse, many of the current scripts leave the machinein a state that requires manual recovery should they fail part-way through.

Indeed, this has literally none of the benefits of the ideal Heatdeployment enumerated above save one: it may be entirely the wrong toolin every way for the job it's being asked to do, but at least it isstill well-integrated with the rest of the infrastructure.

Now, at the Mitaka summit we discussed the idea of a 'split stack',where we have one stack for the infrastructure and a separate one forthe software deployments, so that there is no longer any tightintegration between infrastructure and software. Although it makes me abit sad in some ways, I can certainly appreciate the merits of the ideaas well. However, from the argument above we can deduce that if this isthe *only* thing we do then we will end up in the very worst of allpossible worlds: the wrong tool for the job, poorly integrated. Everysingle advantage of using Heat to deploy software will have evaporated,leaving only disadvantages.


So what would be a good alternative? And how would we evaluate the options?

To my mind, the purpose of the TripleO project is this: to ensure thatthere is an OpenStack community collaborating around each part of theOpenStack installation/management story. We don't care about TripleO"owning" that part (all things being equal, we'd prefer not to), justthat nobody should have to go outside the OpenStack community and/orroll their own thing to install OpenStack unless they want to. So Ithink the ability to sustain a community around whatever solution wechoose ought to be a primary consideration.

The use of Ironic has been something of a success story here. There'sonly one place to add hardware support to enable both installingOpenStack itself on bare-metal via TripleO and the 'regular'bare-metal-to-tenant use case of Ironic. This is a clear win/win.

Beyond getting the bare-metal machines marshalled, the other part of thesolution is configuration management and orchestration of the varioussoftware services. When TripleO started there was nowhere in OpenStackthat was defining the relationships between services needed toorchestrate them. To a large extent there still isn't. I think that oneof the reasons we adopted Puppet in TripleO was that it was supposed toprovide this, at least within a limited scope (i.e. on one machine - thepuppet-deploying community is largely using Ansible to orchestrateacross boxes, and we are using Heat). However, what we've discovered inthe past few months is that Puppet is actually not able to fulfil thisrole as long as we support Pacemaker-based deployments as an option,because in that case Pacemaker actually has control of starting andstopping all of the services. As a result we are back to defining it allourselves in the Pacemaker config plus various hacky shell scripts,instead of relying on (and contributing to!) a larger community. Evenignoring that, Puppet doesn't solve the problem of orchestrating acrossmultiple machines.

Clearly one option would be to encode everything in Heat along the linesof the first example above. I think once we have containers this couldactually work really well for compute nodes and other types of scale-outnodes (e.g. Swift nodes). The scale-out model of Heat scaling groupsworks really well for this use case, and between the improvements wehave put in place (like batched updates and user hooks) and those stillon the agenda (like notifications + automatic Mistral workflowtriggering on hooks) Heat could provide a really good way of capturingthings like migrating user workloads on scale down and rolling updatesin the templates, so that they can be managed completely automaticallyby the undercloud with no client involvement (and when the undercloudbecomes HA, they'll get HA for free). I'd be pretty excited to see thistried. The potential downside is that the orchestration definitions arestill trapped inside the TripleO templates, so they're not being sharedoutside of the TripleO community. This is probably justified thoughowing to its close ties to the underlying infrastructure.

An alternative out of left field: as far as I can gather the "completelynew way of orchestrating activities" used by the new Puppet ApplicationOrchestration thing[2] uses substantially the same model as I describedfor Heat above. If we added Puppet Application Orchestration data toopenstack-puppet-modules then it may be possible to write a tool togenerate Heat templates from that data. However in talking with Emilienit sounds like o-p-m is quite some time away from tackling PAO. So Idon't think this is really feasible.

In any event, it's when we get to the controller nodes that thedownsides become more pronounced. We're no longer talking about onedeployment per service like I sketched above; each service is actuallymultiple deployments forming an active-active cluster with virtual IPsand failover and all that jazz. It may be that everything would justwork the same way, but we would be in uncharted territory and therewould likely be unanticipated subtleties. It's particularly unclear howwe would handle stop-the-world database migrations in this model,although we do have the option of hoping that stop-the-world databasemigrations will have been completely phased out by then.

To make it even more complicated, we ultimately want the services toheterogeneously spread among controller nodes in a configurable way. Ibelieve that Dan's work on composable roles has already gone some waytoward this without even using containers, but it's likely to becomeincreasingly difficult to model in Heat without some sort of templategeneration. (I personally think that template generation would be a GoodThing, but we've chosen not to go down that path so far.) Quite possiblyeven just having composable roles could make it untenable to continuemaintaining separate Pacemaker and non-Pacemaker deployment modes. It'dbe really nice to have the flexibility to do things like scale outdifferent services at different rates. What's more, we are going to needsome way of redistributing services when a machine in the cluster fails,and ultimately we would like that process to be automated, which would*require* a template generation service.

We certainly *could* build all of that. But we definitely shouldn'tbecause this is the kind of thing that services like Kubernetes andApache Mesos are designed to do already. And that raises anotherpossibility: Angus & friends are working on capturing the orchestrationrelationships for Mesos+Marathon within the Kolla project (specifically,in the kolla-mesos repository). This represents a tremendous opportunityfor the TripleO project to further its mission of having the samedeployment tools available to everyone as an official part of theOpenStack project without having to maintain them separately.

As of the Liberty release, Magnum now supports provisioning Mesosclusters, so TripleO wouldn't have to maintain the installer for thateither. (The choice of Mesos is somewhat unfortunate in our case,because Magnum's Kubernetes support is much more mature than its Mesossupport, and because the reasons for the decision are about to be orhave already been overtaken by events - I've heard reports that thefeatures that Kubernetes was missing to allow it to be used forcontroller nodes, and maybe even compute nodes, are now available.Nonetheless, I expect the level of Magnum support for Mesos is likelyworkable.) This is where the TripleO strategy of using OpenStack todeploy OpenStack can really pay dividends: because we use Ironic all ofour servers are accessible through the Nova API, so in theory we canjust run Magnum out of the box.

The chances of me personally having time to prototype this areslim-to-zero, but I think this is a path worth investigating.


cheers,
Zane.


[1] http://martinfowler.com/bliki/BlueGreenDeployment.html
[2] https://puppetlabs.com/introducing-puppet-application-orchestration

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [TripleO][Heat][Kolla][Magnum] The zen of Heat, containers, and the future of TripleO

Reply via email to