Hi everyone, As per below, I've just proposed the creation of a new SIG. Feedback is very welcome - ideally it would all be collected in the same thread I started over on the openstack-sigs list, but feedback in two places is more useful than nowhere, so I'll keep an eye out here too ;-)
Thanks! Adam ----- Forwarded message from Adam Spiers <aspi...@suse.com> ----- Date: Sun, 17 Sep 2017 23:35:02 +0100 From: Adam Spiers <aspi...@suse.com> To: OpenStack SIGs list <openstack-s...@lists.openstack.org> Subject: [Openstack-sigs] [meta] Proposal for self-healing SIG Hi all, [TL;DR: we want to set up a "self-healing infrastructure" SIG.] One of the biggest promises of the cloud vision was the idea that all the infrastructure could be managed in a policy-driven fashion, reacting to failures and other events by automatically healing and optimising services. Most of the components required to implement such an architecture already exist, e.g. - Monasca: Monitoring - Aodh: Alarming - Congress: Policy-based governance - Mistral: Workflow - Senlin: Clustering - Vitrage: Root Cause Analysis - Watcher: Optimization - Masakari: Compute plane HA - Freezer-dr: DR and compute plane HA However, there is not yet a clear strategy within the community for how these should all tie together. So at the PTG last week in Denver, we held an initial cross-project meeting to discuss this topic.[0] It was well-attended, with representation from almost all of the relevant projects, and it felt like a very productive session to me. I shall do my best to summarise whilst trying to avoid any misrepresentation ... There was general agreement that the following actions would be worthwhile: - Document reference stacks describing what use cases can already be addressed with the existing projects. (Even better if some of these stacks have already been tested in the wild.) - Document what integrations between the projects already exist at a technical level. (We actually began this during the meeting, by placing the projects into phases of a high-level flow, and then collaboratively building a Google Drawing to show that.[1]) - Collect real-world use cases from operators, including ones which they would like to accomplish but cannot yet. - From the above, perform gaps analysis to help shape the future direction of these projects, e.g. through specs targetting those gaps. - Perform overlap analysis to help ensure that the projects are correctly scoped and integrate well without duplicating any significant effort.[2] - Set up a SIG[3] to promote further discussion across the projects and with operators. I talked to Thierry afterwards, and consequently this email is the first step on that path :-) - Allocate the SIG a mailing list prefix - "[self-healing]" or similar. - Set up a bi-weekly IRC meeting for the SIG. - Continue the discussion at the Sydney Forum, since it's an ideal opportunity to get developers and operators together and decide what the next steps should be. - Continue the discussion at the next Ops meetup in Tokyo. I got coerced^Wvolunteered to drive the next steps ;-) So far I have created an etherpad proposing the Forum session[4], and added it to the Forum wiki page[5]. I'll also add it to the SIG wiki page[6]. There were things we did not reach a concrete conclusion on: - What should the SIG be called? We felt that "self-healing" was pretty darn close to capturing the intent of the topic. However as a natural pedant, I couldn't help but notice that technically speaking, that would most undesirably exclude Watcher, because the optimization it provides isn't *quite* "healing" - the word "healing" implies that something is sick, and optimization can be applied even when the cloud is perfectly healthy. Any suggestions for a name with a marginally wider scope would be gratefully received. - Should the SIG be scoped to only focus on self-healing (and self-optimization) of OpenStack infrastructure, or should it also include self-healing of workloads? My feeling is that we should keep it scoped to the infrastructure which falls under the responsibility of the cloud operators; anything user-facing would be very different from a process perspective. - How should the SIG's governance be set up? Unfortunately it didn't occur to me to raise this question during the discussion, but I've since seen that the k8s SIG managed to make some decisions in this regard[7], and stealing their idea of a PTL-type model with a minimum of 2 chairs sounds good to me. - Which timezone the IRC meeting should be in? As usual, there were interested parties from all the usual continents, so no one time would suit everyone. I guess I can just submit a review to the irc-meetings repo and we can have a voting war in Gerrit ;-/ Another option would be to alternate timezones every week or two. Feedback on any of this is of course most welcome! After sending this, I'll forward it to openstack-{dev,operators} and ask for any feedback to be submitted here. Thanks, Adam [0] https://etherpad.openstack.org/p/self-healing-queens-ptg [1] https://goo.gl/Pf2KgJ [2] Sampath (Masakari PTL), Saad (Freezer PTL), and I had a productive follow-up discussion on how we could aim to re-scope these two projects to avoid unnecessary duplication of effort. [3] https://ttx.re/introducing-sigs.html [4] https://etherpad.openstack.org/p/self-healing-rocky-forum [5] https://wiki.openstack.org/wiki/Forum/Sydney2017 [6] https://wiki.openstack.org/wiki/OpenStack_SIGs [7] https://etherpad.openstack.org/p/queens-ptg-sig-k8s _______________________________________________ Openstack-sigs mailing list openstack-s...@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-sigs ----- End forwarded message ----- _______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators