On Tue, 2017-10-03 at 16:03 -0600, Alex Schultz wrote:
On Tue, Oct 3, 2017 at 2:46 PM, Dan Prince <dpri...@redhat.com>
wrote:
On Tue, Oct 3, 2017 at 3:50 PM, Alex Schultz <aschu...@redhat.com>
wrote:
On Tue, Oct 3, 2017 at 11:12 AM, Dan Prince <dpri...@redhat.com>
wrote:
On Mon, 2017-10-02 at 15:20 -0600, Alex Schultz wrote:
Hey Dan,
Thanks for sending out a note about this. I have a few
questions
inline.
On Mon, Oct 2, 2017 at 6:02 AM, Dan Prince <dpri...@redhat.co
m>
wrote:
One of the things the TripleO containers team is planning
on
tackling
in Queens is fully containerizing the undercloud. At the
PTG we
created
an etherpad [1] that contains a list of features that need
to be
implemented to fully replace instack-undercloud.
I know we talked about this at the PTG and I was skeptical
that this
will land in Queens. With the exception of the Container's
team
wanting this, I'm not sure there is an actual end user who is
looking
for the feature so I want to make sure we're not just doing
more work
because we as developers think it's a good idea.
I've heard from several operators that they were actually
surprised we
implemented containers in the Overcloud first. Validating a new
deployment framework on a single node Undercloud (for
operators) before
overtaking their entire cloud deployment has a lot of merit to
it IMO.
When you share the same deployment architecture across the
overcloud/undercloud it puts us in a better position to decide
where to
expose new features to operators first (when creating the
undercloud or
overcloud for example).
Also, if you read my email again I've explicitly listed the
"Containers" benefit last. While I think moving the undercloud
to
containers is a great benefit all by itself this is more of a
"framework alignment" in TripleO and gets us out of maintaining
huge
amounts of technical debt. Re-using the same framework for the
undercloud and overcloud has a lot of merit. It effectively
streamlines
the development process for service developers, and 3rd parties
wishing
to integrate some of their components on a single node. Why be
forced
to create a multi-node dev environment if you don't have to
(aren't
using HA for example).
Lets be honest. While instack-undercloud helped solve the old
"seed" VM
issue it was outdated the day it landed upstream. The entire
premise of
the tool is that it uses old style "elements" to create the
undercloud
and we moved away from those as the primary means driving the
creation
of the Overcloud years ago at this point. The new
'undercloud_deploy'
installer gets us back to our roots by once again sharing the
same
architecture to create the over and underclouds. A demo from
long ago
expands on this idea a bit: https://www.youtube.com/watch?v=y1
qMDLAf26
Q&t=5s
In short, we aren't just doing more work because developers
think it is
a good idea. This has potential to be one of the most useful
architectural changes in TripleO that we've made in years.
Could
significantly decrease our CI reasources if we use it to
replace the
existing scenarios jobs which take multiple VMs per job. Is a
building
block we could use for other features like and HA undercloud.
And yes,
it does also have a huge impact on developer velocity in that
many of
us already prefer to use the tool as a means of streamlining
our
dev/test cycles to minutes instead of hours. Why spend hours
running
quickstart Ansible scripts when in many cases you can just
doit.sh. htt
ps://github.com/dprince/undercloud_containers/blob/master/doit.
sh
So like I've repeatedly said, I'm not completely against it as I
agree
what we have is not ideal. I'm not -2, I'm -1 pending additional
information. I'm trying to be realistic and reduce our risk for
this
cycle.
This reduces our complexity greatly I think in that once it is
completed
will allow us to eliminate two project (instack and instack-
undercloud) and
the maintenance thereof. Furthermore, as this dovetails nice with
the
Ansible
I agree. So I think there's some misconceptions here about my
thoughts
on this effort. I am not against this effort. I am for this effort
and
wish to see more of it. I want to see the effort communicated
publicly
via ML and IRC meetings. What I am against switching the default
undercloud method until the containerization of the undercloud has
the
appropriate test coverage and documentation to ensure it is on par
with what it is replacing. Does this make sense?
IMHO doit.sh is not acceptable as an undercloud installer and
this is what I've been trying to point out as the actual impact
to the
end user who has to use this thing.
doit.sh is an example of where the effort is today. It is
essentially the
same stuff we document online here:
http://tripleo.org/install/containers_deployment/undercloud.html.
Similar to quickstart it is just something meant to help you setup
a dev
environment.
Right, providing something that the non-developer uses vs providing
something for hacking are two separate things. Making it consumable
by
the end user (not developer) is what I'm pointing out that needs to
be
accounted for. This is a recurring theme that I have pushed for in
OpenStack to ensure that the operator (actual end user) is accounted
for when making decisions. Tripleo has not done a good job of this
either. Sure the referenced documentation works for the dev case,
but
probably not the actual deployer/operator case.
This will come in time. What I would encourage us to do upstream is
make as much progress on this in Queens as possible so that getting to
the point of polishing our documentation is the focus... instead of the
remaining work.
And to be clear all of this work advocates for the Operator just as
much as it does for the developer. No regressions, improved Ansible
feedback on the CLI, potential for future features around multitude and
alignment of the architecture around containers. Boom! I think
operators will like all of this. We can and will document it.
There needs to be a
migration guide or documentation of old configuration -> new
configuration for the people who are familiar with non-containerized
undercloud vs containerized undercloud. Do we have all the use cases
accounted for etc. etc. This is the part that I don't think we have
figured out and which is what I'm asking that we make sure we account
for with this.
The use case is the replace instack-undercloud with no feature
regressions.
We have an established
installation method for the undercloud, that while isn't great,
isn't
a bash script with git fetches, etc. So as for the
implementation,
this is what I want to see properly flushed out prior to
accepting
this feature as complete for Queens (and the new default).
Of course the feature would need to prove itself before it becomes
the new
default Undercloud. I'm trying to build consensus and get the team
focused
on these things.
What strikes me as odd is your earlier comment about " I want to
make sure
we're not just doing more work because we as developers think it's
a good
idea." I'm a developer and I do think this is a good idea. Please
don't try
to de-motivate this effort just because you happen to believe this.
It was
accepted for Pike and unfortunately we didn't get enough buy in
early enough
to get focus on it. Now that is starting to change and just as it
is you are
suggesting we not keep it a priority?
Once again, I agree and I am on board to the end goal that I think is
trying to be achieved by this effort. What I am currently not on
board
with is the time frame of for Queens based on concerns previously
mentioned. This is not about trying to demotivating an effort. It's
about ensuring quality and something that is consumable by an
additional set of end users of the software (the operator/deployer,
not developer). Given that we have not finished the overcloud
deployment and are still working on fixing items found for that, I
personally feel it's a bit early to consider switching the undercloud
default install to a containerized method. That being said, I have
repeatedly stated that if we account for updates, upgrades, docs and
the operator UX there's no problems with this effort. I just don't
think it's realistic given current timelines (~9 weeks).
Please feel
free to provide information/patches to the contrary.
Whether this feature makes the release or not I think it is too early
to say. What I can say is the amount of work remaining on the
Undercloud feature is IMO a good bit less than we knocked out in the
last release:
https://etherpad.openstack.org/p/tripleo-composable-containers-underclo
ud
And regardless of whether we make the release or not there is a huge
value to moving the work forward now... if only to put us in a better
position for the next release.
I've been on the containers team for a while now and I'm more familiar
with the velocity that we could handle. Let us motivate ourselves and
give updates along the way over the next 2 months as this effort
progresses. Please don't throw "cold water" on why you don't think we
are going to make the release (especially as PTL, this can be quite
harmful to the effort for some). In fact, lets just stop talking about
Queens, and Rocky entirely. I think we can agree that this feature is a
high priority and have people move the effort forward as much as we
can.
This *is* a very important feature. It can be fun to work on. Let those
of us who are doing the work finish scoping it and at least have a
chance at making progress before you throw weight against us not making
the release months from now.
I have not said
don't work on it. I just want to make sure we have all the pieces in
place needed to consider it a proper replacement for the existing
undercloud installation (by M2). If anything there's probably more
work that needs to be done and if we want to make it a priority to
happen, then it needs to be documented and communicated so folks can
assist as they have cycles.
I would
like to see a plan of what features need to be added (eg. the
stuff on
the etherpad), folks assigned to do this work, and estimated
timelines. Given that we shouldn't be making major feature
changes
after M2 (~9 weeks), I want to get an understanding of what is
realistically going to make it. If after reviewing the initial
details we find that it's not actually going to make M2, then
let's
agree to this now rather than trying to force it in at the end.
All of this is forthcoming. Those details will come in time.
I know you've been a great proponent of the containerized
undercloud
and I agree it offers a lot more for development efforts. But I
just
want to make sure that we are getting all the feedback we can
before
continuing down this path. Since, as you point out, a bunch of
this
work is already available for consumption by developers, I don't
see
making it the new default as a requirement for Queens unless it's
a
fully implemented and tested. There's nothing stopping folks
from
using it now and making incremental improvements during Queens
and we
commit to making it the new default for Rocky.
The point of this cycle was supposed to be more
stablization/getting
all the containers in place. Doing something like this seems to
go
against what we were actually trying to achieve. I'd rather make
smaller incremental progress with your proposal being the end
goal and
agreeing that perhaps Rocky is more realistic for the default cut
over.
I thought the point of this release was full containerization? And
part of
that is containerizing the undercloud too right?
Not that I was aware of. Others have asked because they have not been
aware that it included the undercloud. Given that we are wanting to
eventually look to kubernetes maybe we don't need to containerize the
undercloud as it may be it could be discarded with that switch.
I don't think so. The whole point of the initial Undercloud work was
that it aligns the architectures. Using Kubernetes to maintain an
Undercloud would also be a valid approach I think. Perhaps a bit
overkill but it would be a super useful dev environment tool to develop
Kubernetes services on regardless.
And again, there are no plans to containerize instack-undercloud
components as is. I think we have agreement that using containers in
the Undercloud is a high priority and we need to move this effort
forwards.
That's probably a longer discussion. It might need to be researched
which is why it's important to understand why we're doing the
containerization effort and what exactly it entails. Given that I
don't think we're looking to deploy kubernetes via
THT/tripleo-puppet/containers, I wonder what impact this would have
with this effort? That's probably a conversation for another thread.
Lastly, this isn't just a containers team thing. We've been
using the
undercloud_deploy architecture across many teams to help
develop for
almost an entire cycle now. Huge benefits. I would go as far as
saying
that undercloud_deploy was *the* biggest feature in Pike that
enabled
us to bang out a majority of the docker/service templates in
tripleo-
heat-templates.
Given that etherpad
appears to contain a pretty big list of features, are we
going to be
able to land all of them by M2? Would it be beneficial to
craft a
basic spec related to this to ensure we are not missing
additional
things?
I'm not sure there is a lot of value in creating a spec at this
point.
We've already got an approved blueprint for the feature in Pike
here: h
ttps://blueprints.launchpad.net/tripleo/+spec/containerized-
undercloud
I think we might get more velocity out of grooming the etherpad
and
perhaps dividing this work among the appropriate teams.
That's fine, but I would like to see additional efforts made to
organize this work, assign folks and add proper timelines.
Benefits of this work:
-Alignment: aligning the undercloud and overcloud
installers gets
rid
of dual maintenance of services.
I like reusing existing stuff. +1
-Composability: tripleo-heat-templates and our new Ansible
architecture around it are composable. This means any set
of
services
can be used to build up your own undercloud. In other words
the
framework here isn't just useful for "underclouds". It is
really
the
ability to deploy Tripleo on a single node with no external
dependencies. Single node TripleO installer. The containers
team
has
already been leveraging existing (experimental)
undercloud_deploy
installer to develop services for Pike.
Is this something that is actually being asked for or is this
just an
added bonus because it allows developers to reduce what is
actually
being deployed for testing?
There is an implied ask for this feature when a new developer
starts to
use TripleO. Right now resource bar is quite high for TripleO.
You have
to have a multi-node development environment at the very least
(one
undercloud node, and one overcloud node). The ideas we are
talking
about here short circuits this in many cases... where if you
aren't
testing HA services or Ironic you could simple use
undercloud_deploy to
test tripleo-heat-template changes on a single VM. Less
resources, and
much less time spent learning and waiting.
IMHO I don't think the undercloud install is the limiting factor
for
new developers and I'm not sure this is actually reducing that
complexity. It does reduce the amount of hardware needed to
develop
some items, but there's a cost in complexity by moving the
configuration to THT which is already where many people
struggle. As
I previously mentioned, there's nothing stopping us from
promoting the
containerized undercloud as a development tool and ensuring it's
full
featured before switching to it as the default at a later date.
Because the new undercloud_deploy installer uses t-h-t we get
containers for
free. Additionally as we convert over to Ansible instead of Heat
software
deployments we also get better operator feedback there as well.
Woudn't it
be nice to have an Undercloud installer driven by Ansible instead
of Python
and tripleo-image-elements?
Yup, and once again I recognize this as a benefit.
The reason I linked in doit.sh above (and if you actually go and
look at the
recent patches) we are already wiring these things up right now
(before M1!)
and it looks really nice. As we eventually move away from Puppet
for
configuration that too goes away. So I think the idea here is a
net-reduction in complexity because we no longer have to maintain
instack-undercloud, puppet modules, and elements.
It isn't that the undercloud install is a limiting factor. It is
that the
set of services making up your "Undercloud" can be anything you
want because
t-h-t supports all of our services. Anything you want with minimal
t-h-t,
Ansible, and containers. This means you can effectively develop on
a single
node for many cases and it will just work in a multi-node Overcloud
setup
too because we have the same architecture.
My concern is making sure we aren't moving too fast and introducing
more regressions/bugs/missing use cases/etc. My hope is by
documenting
all of this, ensuring we have proper expectations around a definition
of done (and time frames), and allowing for additional review, we
will
reduce the risk introduced by this switch. These types of things
align with what we talked about at the PTG in during the retro[0]
(see: start define definition of done, start status reporting on ML,
stop over committing, stop big change without tests, less complexity,
etc, etc). This stuff's complicated, let's make sure we do it right.
Thanks,
-Alex
[0] http://people.redhat.com/aschultz/denver-ptg/tripleo-ptg-retro.jp
g
Dan
-Development: The containerized undercloud is a great
development
tool. It utilizes the same framework as the full overcloud
deployment
but takes about 20 minutes to deploy. This means faster
iterations,
less waiting, and more testing. Having this be a first
class
citizen
in the ecosystem will ensure this platform is functioning
for
developers to use all the time.
Seems to go with the previous question about the re-usability
for
people who are not developers. Has everyone (including non-
container
folks) tried this out and attest that it's a better workflow
for
them?
Are there use cases that are made worse by switching?
I would let other chime in but the feedback I've gotten has
mostly been
that it improves the dev/test cycle greatly.
-CI resources: better use of CI resources. At the PTG we
received
feedback from the OpenStack infrastructure team that our
upstream
CI
resource usage is quite high at times (even as high as 50%
of the
total). Because of the shared framework and single node
capabilities we
can re-architecture much of our upstream CI matrix around
single
node.
We no longer require multinode jobs to be able to test many
of the
services in tripleo-heat-templates... we can just use a
single
cloud VM
instead. We'll still want multinode undercloud -> overcloud
jobs
for
testing things like HA and baremetal provisioning. But we
can cover
a
large set of the services (in particular many of the new
scenario
jobs
we added in Pike) with single node CI test runs in much
less time.
I like this idea but would like to see more details around
this.
Since this is a new feature we need to make sure that we are
properly
covering the containerized undercloud with CI as well. I
think we
need 3 jobs to properly cover this feature before marking it
done. I
added them to the etherpad but I think we need to ensure the
following
3 jobs are defined and voting by M2 to consider actually
switching
from the current instack-undercloud installation to the
containerized
version.
1) undercloud-containers - a containerized install, should be
voting
by m1
2) undercloud-containers-update - minor updates run on
containerized
underclouds, should be voting by m2
3) undercloud-containers-upgrade - major upgrade from
non-containerized to containerized undercloud, should be
voting by
m2.
If we have these jobs, is there anything we can drop or mark
as
covered that is currently being covered by an overcloud job?
Can you please comment on these expectations as being
achievable? If
they are not achievable, I don't think we can agree to switch the
default for Queens. As we shipped the 'undercloud deploy' as
experimental for Pike, it's well within reason to continue to do
so
for Queens. Perhaps we change the labeling to beta or working it
into
a --containerized option for 'undercloud install'.
I think my ask for the undercloud-containers job as non-voting by
m1
is achievable today because it's currently green (pending any
zuul
freezes). My concern is really minor updates and upgrades need to
be
understood and accounted for ASAP. If we're truly able to reuse
some
of the work we did for O->P upgrades, then these should be fairly
straight forward things to accomplish and there would be fewer
blockers to make the switch.
-Containers: There are no plans to containerize the
existing
instack-
undercloud work. By moving our undercloud installer to a
tripleo-
heat-
templates and Ansible architecture we can leverage
containers.
Interestingly, the same installer also supports baremetal
(package)
installation as well at this point. Like to overcloud
however I
think
making containers our undercloud default would better align
the
TripleO
tooling.
We are actively working through a few issues with the
deployment
framework Ansible effort to fully integrate that into the
undercloud
installer. We are also reaching out to other teams like the
UI and
Security folks to coordinate the efforts around those
components.
If
there are any questions about the effort or you'd like to
be
involved
in the implementation let us know. Stay tuned for more
specific
updates
as we organize to get as much of this in M1 and M2 as
possible.
I would like to see weekly updates on this effort during the
IRC
meeting. As previously mentioned around squad status, I'll be
asking
for them during the meeting so it would be nice to get an
update this
on a weekly basis so we can make sure that we'll be OK to cut
over.
Also what does the cut over plan look like? This is
something that
might be beneficial to have in a spec. IMHO, I'm ok to
continue
pushing the container effort using the openstack undercloud
deploy
method for now. Once we have voting CI jobs and the feature
list has
been covered then we can evaluate if we've made the M2 time
frame to
switching openstack undercloud deploy to be the new
undercloud
install. I want to make sure we don't introduce regressions
and are
doing thing in a user friendly fashion since the undercloud
is the
first intro an end user gets to tripleo. It would be a good
idea to
review what the new install process looks like and make sure
it "just
works" given that the current process[0] (with all it's
flaws) is
fairly trivial to perform.
Basically what I would like to see before making this new default
is:
1) minor updates work (with CI)
2) P->Q upgrades work (with CI)
3) Documentation complete
4) no UX impact for installation (eg. how they installed it
before is
the same as they install it now for containers)
If these are accounted for and completed before M2 then I would
be +2
on the switch.
Thanks,
-Alex
[0] https://docs.openstack.org/tripleo-docs/latest/install/in
stallati
on/installation.html#installing-the-undercloud
On behalf of the containers team,
Dan
[1] https://etherpad.openstack.org/p/tripleo-queens-undercl
oud-cont
aine
rs
___________________________________________________________
________
_______
OpenStack Development Mailing List (not for usage
questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subj
ect:unsu
bscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/opensta
ck-dev
_____________________________________________________________
________
_____
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subjec
t:unsubs
cribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
-dev
_______________________________________________________________
___________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-d
ev
_________________________________________________________________
_________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:un
subscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___________________________________________________________________
_______
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsu
bscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
_____________________________________________________________________
_____
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubs
cribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev