I have to agree with James.... My affinity and anti-affinity rules have nothing to do with NFV. a-a is almost always a failure domain solution. I'm not sure we have users actually choosing affinity (though it would likely be for network speed issues and/or some sort of badly architected need or perceived need for coupling.)
On Mon, May 22, 2017 at 12:45 PM, James Penick <jpen...@gmail.com> wrote: > > > On Mon, May 22, 2017 at 10:54 AM, Jay Pipes <jaypi...@gmail.com> wrote: > >> Hi Ops, >> >> Hi! > > >> >> For class b) causes, we should be able to solve this issue when the >> placement service understands affinity/anti-affinity (maybe Queens/Rocky). >> Until then, we propose that instead of raising a Reschedule when an >> affinity constraint was last-minute violated due to a racing scheduler >> decision, that we simply set the instance to an ERROR state. >> >> Personally, I have only ever seen anti-affinity/affinity use cases in >> relation to NFV deployments, and in every NFV deployment of OpenStack there >> is a VNFM or MANO solution that is responsible for the orchestration of >> instances belonging to various service function chains. I think it is >> reasonable to expect the MANO system to be responsible for attempting a >> re-launch of an instance that was set to ERROR due to a last-minute >> affinity violation. >> > > >> **Operators, do you agree with the above?** >> > > I do not. My affinity and anti-affinity use cases reflect the need to > build large applications across failure domains in a datacenter. > > Anti-affinity: Most anti-affinity use cases relate to the ability to > guarantee that instances are scheduled across failure domains, others > relate to security compliance. > > Affinity: Hadoop/Big data deployments have affinity use cases, where nodes > processing data need to be in the same rack as the nodes which house the > data. This is a common setup for large hadoop deployers. > > >> I recognize that large Ironic users expressed their concerns about >> IPMI/BMC communication being unreliable and not wanting to have users >> manually retry a baremetal instance launch. But, on this particular point, >> I'm of the opinion that Nova just do one thing and do it well. Nova isn't >> an orchestrator, nor is it intending to be a "just continually try to get >> me to this eventual state" system like Kubernetes. >> > > Kubernetes is a larger orchestration platform that provides autoscale. I > don't expect Nova to provide autoscale, but > > I agree that Nova should do one thing and do it really well, and in my > mind that thing is reliable provisioning of compute resources. Kubernetes > does autoscale among other things. I'm not asking for Nova to provide > Autoscale, I -AM- asking OpenStack's compute platform to provision a > discrete compute resource reliably. This means overcoming common and simple > error cases. As a deployer of OpenStack I'm trying to build a cloud that > wraps the chaos of infrastructure, and present a reliable facade. When my > users issue a boot request, I want to see if fulfilled. I don't expect it > to be a 100% guarantee across any possible failure, but I expect (and my > users demand) that my "Infrastructure as a service" API make reasonable > accommodation to overcome common failures. > > > >> If we removed Reschedule for class c) failures entirely, large Ironic >> deployers would have to train users to manually retry a failed launch or >> would need to write a simple retry mechanism into whatever client/UI that >> they expose to their users. >> >> **Ironic operators, would the above decision force you to abandon Nova as >> the multi-tenant BMaaS facility?** >> >> > I just glanced at one of my production clusters and found there are > around 7K users defined, many of whom use OpenStack on a daily basis. When > they issue a boot call, they expect that request to be honored. From their > perspective, if they call AWS, they get what they ask for. If you remove > reschedules you're not just breaking the expectation of a single deployer, > but for my thousands of engineers who, every day, rely on OpenStack to > manage their stack. > > I don't have a "i'll take my football and go home" mentality. But if you > remove the ability for the compute provisioning API to present a reliable > facade over infrastructure, I have to go write something else, or patch it > back in. Now it's even harder for me to get and stay current with OpenStack. > > During the summit the agreement was, if I recall, that reschedules would > happen within a cell, and not between the parent and cell. That was > completely acceptable to me. > > -James > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > >
_______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators