FYI, a container with net=host runs exactly like it was running outside of a container with respect to iptables/networking. So that should not be an issue. If it can be done on the host, it should be able to happen in a container.
Thanks, Kevin ________________________________ From: Dan Prince [dpri...@redhat.com] Sent: Wednesday, October 04, 2017 9:50 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [TripleO] containerized undercloud in Queens On Wed, Oct 4, 2017 at 9:10 AM, Dmitry Tantsur <dtant...@redhat.com<mailto:dtant...@redhat.com>> wrote: (top-posting, as it is not a direct response to a specific line) This is your friendly reminder that we're not quite near containerized ironic-inspector. The THT for it has probably never been tested at all, and the iptables magic we do may simply not be containers-compatible. Milan would appreciate any help with his ironic-inspector rework. Thanks Dmitry. Exactly the update I was looking for. Look forward to syncing w/ Milan on this. Dan Dmitry On 10/04/2017 03:00 PM, Dan Prince wrote: On Tue, 2017-10-03 at 16:03 -0600, Alex Schultz wrote: On Tue, Oct 3, 2017 at 2:46 PM, Dan Prince <dpri...@redhat.com<mailto:dpri...@redhat.com>> wrote: On Tue, Oct 3, 2017 at 3:50 PM, Alex Schultz <aschu...@redhat.com<mailto:aschu...@redhat.com>> wrote: On Tue, Oct 3, 2017 at 11:12 AM, Dan Prince <dpri...@redhat.com<mailto:dpri...@redhat.com>> wrote: On Mon, 2017-10-02 at 15:20 -0600, Alex Schultz wrote: Hey Dan, Thanks for sending out a note about this. I have a few questions inline. On Mon, Oct 2, 2017 at 6:02 AM, Dan Prince <dpri...@redhat.co<mailto:dpri...@redhat.co> m> wrote: One of the things the TripleO containers team is planning on tackling in Queens is fully containerizing the undercloud. At the PTG we created an etherpad [1] that contains a list of features that need to be implemented to fully replace instack-undercloud. I know we talked about this at the PTG and I was skeptical that this will land in Queens. With the exception of the Container's team wanting this, I'm not sure there is an actual end user who is looking for the feature so I want to make sure we're not just doing more work because we as developers think it's a good idea. I've heard from several operators that they were actually surprised we implemented containers in the Overcloud first. Validating a new deployment framework on a single node Undercloud (for operators) before overtaking their entire cloud deployment has a lot of merit to it IMO. When you share the same deployment architecture across the overcloud/undercloud it puts us in a better position to decide where to expose new features to operators first (when creating the undercloud or overcloud for example). Also, if you read my email again I've explicitly listed the "Containers" benefit last. While I think moving the undercloud to containers is a great benefit all by itself this is more of a "framework alignment" in TripleO and gets us out of maintaining huge amounts of technical debt. Re-using the same framework for the undercloud and overcloud has a lot of merit. It effectively streamlines the development process for service developers, and 3rd parties wishing to integrate some of their components on a single node. Why be forced to create a multi-node dev environment if you don't have to (aren't using HA for example). Lets be honest. While instack-undercloud helped solve the old "seed" VM issue it was outdated the day it landed upstream. The entire premise of the tool is that it uses old style "elements" to create the undercloud and we moved away from those as the primary means driving the creation of the Overcloud years ago at this point. The new 'undercloud_deploy' installer gets us back to our roots by once again sharing the same architecture to create the over and underclouds. A demo from long ago expands on this idea a bit: https://www.youtube.com/watch?v=y1 qMDLAf26 Q&t=5s In short, we aren't just doing more work because developers think it is a good idea. This has potential to be one of the most useful architectural changes in TripleO that we've made in years. Could significantly decrease our CI reasources if we use it to replace the existing scenarios jobs which take multiple VMs per job. Is a building block we could use for other features like and HA undercloud. And yes, it does also have a huge impact on developer velocity in that many of us already prefer to use the tool as a means of streamlining our dev/test cycles to minutes instead of hours. Why spend hours running quickstart Ansible scripts when in many cases you can just doit.sh. htt ps://github.com/dprince/undercloud_containers/blob/master/doit<http://github.com/dprince/undercloud_containers/blob/master/doit>. sh So like I've repeatedly said, I'm not completely against it as I agree what we have is not ideal. I'm not -2, I'm -1 pending additional information. I'm trying to be realistic and reduce our risk for this cycle. This reduces our complexity greatly I think in that once it is completed will allow us to eliminate two project (instack and instack- undercloud) and the maintenance thereof. Furthermore, as this dovetails nice with the Ansible I agree. So I think there's some misconceptions here about my thoughts on this effort. I am not against this effort. I am for this effort and wish to see more of it. I want to see the effort communicated publicly via ML and IRC meetings. What I am against switching the default undercloud method until the containerization of the undercloud has the appropriate test coverage and documentation to ensure it is on par with what it is replacing. Does this make sense? IMHO doit.sh is not acceptable as an undercloud installer and this is what I've been trying to point out as the actual impact to the end user who has to use this thing. doit.sh is an example of where the effort is today. It is essentially the same stuff we document online here: http://tripleo.org/install/containers_deployment/undercloud.html. Similar to quickstart it is just something meant to help you setup a dev environment. Right, providing something that the non-developer uses vs providing something for hacking are two separate things. Making it consumable by the end user (not developer) is what I'm pointing out that needs to be accounted for. This is a recurring theme that I have pushed for in OpenStack to ensure that the operator (actual end user) is accounted for when making decisions. Tripleo has not done a good job of this either. Sure the referenced documentation works for the dev case, but probably not the actual deployer/operator case. This will come in time. What I would encourage us to do upstream is make as much progress on this in Queens as possible so that getting to the point of polishing our documentation is the focus... instead of the remaining work. And to be clear all of this work advocates for the Operator just as much as it does for the developer. No regressions, improved Ansible feedback on the CLI, potential for future features around multitude and alignment of the architecture around containers. Boom! I think operators will like all of this. We can and will document it. There needs to be a migration guide or documentation of old configuration -> new configuration for the people who are familiar with non-containerized undercloud vs containerized undercloud. Do we have all the use cases accounted for etc. etc. This is the part that I don't think we have figured out and which is what I'm asking that we make sure we account for with this. The use case is the replace instack-undercloud with no feature regressions. We have an established installation method for the undercloud, that while isn't great, isn't a bash script with git fetches, etc. So as for the implementation, this is what I want to see properly flushed out prior to accepting this feature as complete for Queens (and the new default). Of course the feature would need to prove itself before it becomes the new default Undercloud. I'm trying to build consensus and get the team focused on these things. What strikes me as odd is your earlier comment about " I want to make sure we're not just doing more work because we as developers think it's a good idea." I'm a developer and I do think this is a good idea. Please don't try to de-motivate this effort just because you happen to believe this. It was accepted for Pike and unfortunately we didn't get enough buy in early enough to get focus on it. Now that is starting to change and just as it is you are suggesting we not keep it a priority? Once again, I agree and I am on board to the end goal that I think is trying to be achieved by this effort. What I am currently not on board with is the time frame of for Queens based on concerns previously mentioned. This is not about trying to demotivating an effort. It's about ensuring quality and something that is consumable by an additional set of end users of the software (the operator/deployer, not developer). Given that we have not finished the overcloud deployment and are still working on fixing items found for that, I personally feel it's a bit early to consider switching the undercloud default install to a containerized method. That being said, I have repeatedly stated that if we account for updates, upgrades, docs and the operator UX there's no problems with this effort. I just don't think it's realistic given current timelines (~9 weeks). Please feel free to provide information/patches to the contrary. Whether this feature makes the release or not I think it is too early to say. What I can say is the amount of work remaining on the Undercloud feature is IMO a good bit less than we knocked out in the last release: https://etherpad.openstack.org/p/tripleo-composable-containers-underclo ud And regardless of whether we make the release or not there is a huge value to moving the work forward now... if only to put us in a better position for the next release. I've been on the containers team for a while now and I'm more familiar with the velocity that we could handle. Let us motivate ourselves and give updates along the way over the next 2 months as this effort progresses. Please don't throw "cold water" on why you don't think we are going to make the release (especially as PTL, this can be quite harmful to the effort for some). In fact, lets just stop talking about Queens, and Rocky entirely. I think we can agree that this feature is a high priority and have people move the effort forward as much as we can. This *is* a very important feature. It can be fun to work on. Let those of us who are doing the work finish scoping it and at least have a chance at making progress before you throw weight against us not making the release months from now. I have not said don't work on it. I just want to make sure we have all the pieces in place needed to consider it a proper replacement for the existing undercloud installation (by M2). If anything there's probably more work that needs to be done and if we want to make it a priority to happen, then it needs to be documented and communicated so folks can assist as they have cycles. I would like to see a plan of what features need to be added (eg. the stuff on the etherpad), folks assigned to do this work, and estimated timelines. Given that we shouldn't be making major feature changes after M2 (~9 weeks), I want to get an understanding of what is realistically going to make it. If after reviewing the initial details we find that it's not actually going to make M2, then let's agree to this now rather than trying to force it in at the end. All of this is forthcoming. Those details will come in time. I know you've been a great proponent of the containerized undercloud and I agree it offers a lot more for development efforts. But I just want to make sure that we are getting all the feedback we can before continuing down this path. Since, as you point out, a bunch of this work is already available for consumption by developers, I don't see making it the new default as a requirement for Queens unless it's a fully implemented and tested. There's nothing stopping folks from using it now and making incremental improvements during Queens and we commit to making it the new default for Rocky. The point of this cycle was supposed to be more stablization/getting all the containers in place. Doing something like this seems to go against what we were actually trying to achieve. I'd rather make smaller incremental progress with your proposal being the end goal and agreeing that perhaps Rocky is more realistic for the default cut over. I thought the point of this release was full containerization? And part of that is containerizing the undercloud too right? Not that I was aware of. Others have asked because they have not been aware that it included the undercloud. Given that we are wanting to eventually look to kubernetes maybe we don't need to containerize the undercloud as it may be it could be discarded with that switch. I don't think so. The whole point of the initial Undercloud work was that it aligns the architectures. Using Kubernetes to maintain an Undercloud would also be a valid approach I think. Perhaps a bit overkill but it would be a super useful dev environment tool to develop Kubernetes services on regardless. And again, there are no plans to containerize instack-undercloud components as is. I think we have agreement that using containers in the Undercloud is a high priority and we need to move this effort forwards. That's probably a longer discussion. It might need to be researched which is why it's important to understand why we're doing the containerization effort and what exactly it entails. Given that I don't think we're looking to deploy kubernetes via THT/tripleo-puppet/containers, I wonder what impact this would have with this effort? That's probably a conversation for another thread. Lastly, this isn't just a containers team thing. We've been using the undercloud_deploy architecture across many teams to help develop for almost an entire cycle now. Huge benefits. I would go as far as saying that undercloud_deploy was *the* biggest feature in Pike that enabled us to bang out a majority of the docker/service templates in tripleo- heat-templates. Given that etherpad appears to contain a pretty big list of features, are we going to be able to land all of them by M2? Would it be beneficial to craft a basic spec related to this to ensure we are not missing additional things? I'm not sure there is a lot of value in creating a spec at this point. We've already got an approved blueprint for the feature in Pike here: h ttps://blueprints.launchpad.net/tripleo/+spec/containerized-<http://blueprints.launchpad.net/tripleo/+spec/containerized-> undercloud I think we might get more velocity out of grooming the etherpad and perhaps dividing this work among the appropriate teams. That's fine, but I would like to see additional efforts made to organize this work, assign folks and add proper timelines. Benefits of this work: -Alignment: aligning the undercloud and overcloud installers gets rid of dual maintenance of services. I like reusing existing stuff. +1 -Composability: tripleo-heat-templates and our new Ansible architecture around it are composable. This means any set of services can be used to build up your own undercloud. In other words the framework here isn't just useful for "underclouds". It is really the ability to deploy Tripleo on a single node with no external dependencies. Single node TripleO installer. The containers team has already been leveraging existing (experimental) undercloud_deploy installer to develop services for Pike. Is this something that is actually being asked for or is this just an added bonus because it allows developers to reduce what is actually being deployed for testing? There is an implied ask for this feature when a new developer starts to use TripleO. Right now resource bar is quite high for TripleO. You have to have a multi-node development environment at the very least (one undercloud node, and one overcloud node). The ideas we are talking about here short circuits this in many cases... where if you aren't testing HA services or Ironic you could simple use undercloud_deploy to test tripleo-heat-template changes on a single VM. Less resources, and much less time spent learning and waiting. IMHO I don't think the undercloud install is the limiting factor for new developers and I'm not sure this is actually reducing that complexity. It does reduce the amount of hardware needed to develop some items, but there's a cost in complexity by moving the configuration to THT which is already where many people struggle. As I previously mentioned, there's nothing stopping us from promoting the containerized undercloud as a development tool and ensuring it's full featured before switching to it as the default at a later date. Because the new undercloud_deploy installer uses t-h-t we get containers for free. Additionally as we convert over to Ansible instead of Heat software deployments we also get better operator feedback there as well. Woudn't it be nice to have an Undercloud installer driven by Ansible instead of Python and tripleo-image-elements? Yup, and once again I recognize this as a benefit. The reason I linked in doit.sh above (and if you actually go and look at the recent patches) we are already wiring these things up right now (before M1!) and it looks really nice. As we eventually move away from Puppet for configuration that too goes away. So I think the idea here is a net-reduction in complexity because we no longer have to maintain instack-undercloud, puppet modules, and elements. It isn't that the undercloud install is a limiting factor. It is that the set of services making up your "Undercloud" can be anything you want because t-h-t supports all of our services. Anything you want with minimal t-h-t, Ansible, and containers. This means you can effectively develop on a single node for many cases and it will just work in a multi-node Overcloud setup too because we have the same architecture. My concern is making sure we aren't moving too fast and introducing more regressions/bugs/missing use cases/etc. My hope is by documenting all of this, ensuring we have proper expectations around a definition of done (and time frames), and allowing for additional review, we will reduce the risk introduced by this switch. These types of things align with what we talked about at the PTG in during the retro[0] (see: start define definition of done, start status reporting on ML, stop over committing, stop big change without tests, less complexity, etc, etc). This stuff's complicated, let's make sure we do it right. Thanks, -Alex [0] http://people.redhat.com/aschultz/denver-ptg/tripleo-ptg-retro.jp g Dan -Development: The containerized undercloud is a great development tool. It utilizes the same framework as the full overcloud deployment but takes about 20 minutes to deploy. This means faster iterations, less waiting, and more testing. Having this be a first class citizen in the ecosystem will ensure this platform is functioning for developers to use all the time. Seems to go with the previous question about the re-usability for people who are not developers. Has everyone (including non- container folks) tried this out and attest that it's a better workflow for them? Are there use cases that are made worse by switching? I would let other chime in but the feedback I've gotten has mostly been that it improves the dev/test cycle greatly. -CI resources: better use of CI resources. At the PTG we received feedback from the OpenStack infrastructure team that our upstream CI resource usage is quite high at times (even as high as 50% of the total). Because of the shared framework and single node capabilities we can re-architecture much of our upstream CI matrix around single node. We no longer require multinode jobs to be able to test many of the services in tripleo-heat-templates... we can just use a single cloud VM instead. We'll still want multinode undercloud -> overcloud jobs for testing things like HA and baremetal provisioning. But we can cover a large set of the services (in particular many of the new scenario jobs we added in Pike) with single node CI test runs in much less time. I like this idea but would like to see more details around this. Since this is a new feature we need to make sure that we are properly covering the containerized undercloud with CI as well. I think we need 3 jobs to properly cover this feature before marking it done. I added them to the etherpad but I think we need to ensure the following 3 jobs are defined and voting by M2 to consider actually switching from the current instack-undercloud installation to the containerized version. 1) undercloud-containers - a containerized install, should be voting by m1 2) undercloud-containers-update - minor updates run on containerized underclouds, should be voting by m2 3) undercloud-containers-upgrade - major upgrade from non-containerized to containerized undercloud, should be voting by m2. If we have these jobs, is there anything we can drop or mark as covered that is currently being covered by an overcloud job? Can you please comment on these expectations as being achievable? If they are not achievable, I don't think we can agree to switch the default for Queens. As we shipped the 'undercloud deploy' as experimental for Pike, it's well within reason to continue to do so for Queens. Perhaps we change the labeling to beta or working it into a --containerized option for 'undercloud install'. I think my ask for the undercloud-containers job as non-voting by m1 is achievable today because it's currently green (pending any zuul freezes). My concern is really minor updates and upgrades need to be understood and accounted for ASAP. If we're truly able to reuse some of the work we did for O->P upgrades, then these should be fairly straight forward things to accomplish and there would be fewer blockers to make the switch. -Containers: There are no plans to containerize the existing instack- undercloud work. By moving our undercloud installer to a tripleo- heat- templates and Ansible architecture we can leverage containers. Interestingly, the same installer also supports baremetal (package) installation as well at this point. Like to overcloud however I think making containers our undercloud default would better align the TripleO tooling. We are actively working through a few issues with the deployment framework Ansible effort to fully integrate that into the undercloud installer. We are also reaching out to other teams like the UI and Security folks to coordinate the efforts around those components. If there are any questions about the effort or you'd like to be involved in the implementation let us know. Stay tuned for more specific updates as we organize to get as much of this in M1 and M2 as possible. I would like to see weekly updates on this effort during the IRC meeting. As previously mentioned around squad status, I'll be asking for them during the meeting so it would be nice to get an update this on a weekly basis so we can make sure that we'll be OK to cut over. Also what does the cut over plan look like? This is something that might be beneficial to have in a spec. IMHO, I'm ok to continue pushing the container effort using the openstack undercloud deploy method for now. Once we have voting CI jobs and the feature list has been covered then we can evaluate if we've made the M2 time frame to switching openstack undercloud deploy to be the new undercloud install. I want to make sure we don't introduce regressions and are doing thing in a user friendly fashion since the undercloud is the first intro an end user gets to tripleo. It would be a good idea to review what the new install process looks like and make sure it "just works" given that the current process[0] (with all it's flaws) is fairly trivial to perform. Basically what I would like to see before making this new default is: 1) minor updates work (with CI) 2) P->Q upgrades work (with CI) 3) Documentation complete 4) no UX impact for installation (eg. how they installed it before is the same as they install it now for containers) If these are accounted for and completed before M2 then I would be +2 on the switch. Thanks, -Alex [0] https://docs.openstack.org/tripleo-docs/latest/install/in stallati on/installation.html#installing-the-undercloud On behalf of the containers team, Dan [1] https://etherpad.openstack.org/p/tripleo-queens-undercl oud-cont aine rs ___________________________________________________________ ________ _______ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subj<http://openstack-dev-requ...@lists.openstack.org?subj> ect:unsu bscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/opensta ck-dev _____________________________________________________________ ________ _____ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subjec<http://openstack-dev-requ...@lists.openstack.org?subjec> t:unsubs cribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack -dev _______________________________________________________________ ___________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-d ev _________________________________________________________________ _________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:un<http://openstack-dev-requ...@lists.openstack.org?subject:un> subscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___________________________________________________________________ _______ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsu<http://openstack-dev-requ...@lists.openstack.org?subject:unsu> bscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev _____________________________________________________________________ _____ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubs<http://openstack-dev-requ...@lists.openstack.org?subject:unsubs> cribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev