Re: [openstack-dev] [tripleo] prototype with standalone mode and remote edge compute nodes

Ben Nemec Fri, 20 Jul 2018 14:45:02 -0700


On 07/20/2018 02:53 PM, James Slagle wrote:

On Thu, Jul 19, 2018 at 7:13 PM, Ben Nemec <openst...@nemebean.com> wrote:



On 07/19/2018 03:37 PM, Emilien Macchi wrote:


Today I played a little bit with Standalone deployment [1] to deploy a
single OpenStack cloud without the need of an undercloud and overcloud.
The use-case I am testing is the following:
"As an operator, I want to deploy a single node OpenStack, that I can
extend with remote compute nodes on the edge when needed."

We still have a bunch of things to figure out so it works out of the box,
but so far I was able to build something that worked, and I found useful to
share it early to gather some feedback:
https://gitlab.com/emacchi/tripleo-standalone-edge

Keep in mind this is a proof of concept, based on upstream documentation
and re-using 100% what is in TripleO today. The only thing I'm doing is to
change the environment and the roles for the remote compute node.
I plan to work on cleaning the manual steps that I had to do to make it
working, like hardcoding some hiera parameters and figure out how to
override ServiceNetmap.

Anyway, feel free to test / ask questions / provide feedback.



What is the benefit of doing this over just using deployed server to install
a remote server from the central management system?  You need to have
connectivity back to the central location anyway.  Won't this become
unwieldy with a large number of edge nodes?  I thought we told people not to
use Packstack for multi-node deployments for exactly that reason.

I guess my concern is that eliminating the undercloud makes sense for
single-node PoC's and development work, but for what sounds like a
production workload I feel like you're cutting off your nose to spite your
face.  In the interest of saving one VM's worth of resources, now all of
your day 2 operations have no built-in orchestration.  Every time you want
to change a configuration it's "copy new script to system, ssh to system,
run script, repeat for all systems.  So maybe this is a backdoor way to make
Ansible our API? ;-)


I believe Emilien was looking at this POC in part because of some
input from me, so I will attempt to address your questions
constructively.

What you're looking at here is exactly a POC. The deployment is a POC
using the experimental standalone code. I think the use case as
presented by Emilien is something worth considering:

"As an operator, I want to deploy a single node OpenStack, that I can
extend with remote compute nodes on the edge when needed."


I wouldn't interpret that to mean much of anything around eliminating
the undercloud, other than what is stated for the use case. I feel
that  jumping to eliminating the undercloud would be an over
simplification. The goal of the POC isn't packstack parity, or even
necessarily a packstack like architecture.

Okay, this was the main disconnect for me. I got the impression fromthe discussion up til now that eliminating the undercloud was part ofthe requirements. Looking back at Emilien's original email I think Iconflated the standalone PoC description with the use-case description.My bad.


One of the goals is to see if we can deploy separate disconnected
stacks for Control and Compute. The standalone work happens to be a
good way to test out some of the work around that. The use case was
written to help describe and provide an overall picture of what is
going on with this specific POC, with a focus towards the edge use
case.

You make some points about centralized management and connectivity
back to the central location. Those are the exact sorts of things we
are thinking about when we consider how we will address edge
deployments. If you haven't had a chance yet, check out the Edge
Computing whitepaper from the foundation:

https://www.openstack.org/assets/edge/OpenStack-EdgeWhitepaper-v3-online.pdf

Particularly the challenges outlined around management and deployment
tooling. For lack of anything better I'm calling these the 3 D's:
- Decentralized
- Distributed
- Disconnected

How can TripleO address any of these?

For Decentralized, I'd like to see better separation between the
planning and application of the deployment in TripleO. TripleO has had
the concept of a plan for quite a while, and we've been using it very
effectively for our deployment, but it is somewhat hidden from the
operator. It's not entirely clear to the user that there is any
separation between the plan and the stack, and what benefit there even
is in the plan.

+1. I was disappointed that we didn't adopt the plan as more of afirst-class citizen for cli deployments after it was implemented.


I'd like to address some of that through API improvements around plan
management and making the plan the top level thing being managed
instead of a deployment. We're already moving in this direction with
config-download and a lot of the changes we've made during Queens.

For better or worse, some other tools like Terraform call this out as
one their main differentiators:

https://www.terraform.io/intro/vs/cloudformation.html (3rd paragraph).

TripleO has long separated the planning and application phases. We
just need to do a better job at developing useful features around that
work. The UI has been taking advantage of it more than anything else
at this point. I'd like to focus a bit more on what benefits we get
from the plan, and how we can turn these into operator value.

Imagine a scenario where you have a plan that has been deployed, and
you want to make some changes. You upload a new plan, the plan is
processed, we update a copy of the deployed stack (or perhaps
ephemeral stack), run config-download, and the operator has the
immediate feedback about what *would* be changed. Heat plays a role
here in giving us a way to orchestrate the plan into a deployment
model.

Ansible also plays a role in that we could take things a step further
and run with --check to provide further feedback before anything is
ever applied or updated. Ongoing work around new baremetal management
workflows via metalsmith will give us more insight into planning the
baremetal deployment. These tools (Heat/Ansible/Metalsmith/etc), they
are technology choices. They are not architectures in and of
themselves.

You have centralized management of the planning phase, whose output
could be a set of playbooks applied in a decentralized way, such as
provided via an API and downloaded to a remote site where an operator
is sitting in a emergency response scenario with some "hardware in a
box" that they want to deploy local compute/storage resources on to,
and connect to a local network. Connectivity back to the centralized
platform may or may not be required depending on what services are
deployed.

For Distributed, I think of git. We have built-in git management of
the config-download output. We are discussing (further) git management
of the templates and processed plan. This gives operators some ability
to manage the output in a distributive fashion, and make new changes
outside of the centralized platform.

Perhaps in the future, we could offer an API/interface around pulling
any changes back into the represented plan based on what an operator
had changed. Sort of like a pull request for the plan, but by starting
with the output.

Obviously, this needs a lot more definition and refining other than
just "use git". Again, these efforts are about experimenting with use
cases, not technology choices. To get us to those experiments quickly,
it may look like we are making rash decisions about use X or Y, but
that's not the driver here.

+1 again. I argued to use git as the storage backend for plans in thefirst place. :-) This isn't the exact use case I had in mind, butthere's definitely overlap.


For Disconnected, it also ties into how we'd address decentralized and
distributed. The choice of tooling helps, but it's not as simple as
"use Ansible". Part of the reason we are looking at this POC, and how
to deploy it easily is to investigate questions such as what happens
to the deployed workloads if the compute loses connectivity to the
control plane or management platform. We want to make sure TripleO can
deploy something that can handle these sorts of scenarios. During
periods of disconnection at the edge or other remote sites, operators
may still need to make changes (see points about distributed above).

This is a requirement I was missing as well. If you don't necessarilyhave connectivity back to the mothership and need to be able to managethe deployment anyway then the standalone part is obviously a necessity.I'd be curious how this works with OpenStack in general, but like yousaid this is a PoC to find out.


Using the standalone deployment can help us quickly answer these
questions and develop a "Steel Thread"[1] to build upon.

Ultimately, this is the sort of high level designs and architectures
we are beginning to investigate. We are trying to let the use cases
and operator need address the design, even while the use cases are
still being better understood (see above whitepaper). It's not about
"just use Ansible" or "rewrite the API".

[1] 
http://www.agiledevelopment.org/agile-talk/111-defining-acceptance-criteria-using-the-steel-thread-concept


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] prototype with standalone mode and remote edge compute nodes

Reply via email to