On Sat, Jan 25, 2014 at 7:11 AM, Clint Byrum <cl...@fewbar.com> wrote: > > Excerpts from Robert Collins's message of 2014-01-25 02:47:42 -0800: > > On 25 January 2014 19:42, Clint Byrum <cl...@fewbar.com> wrote: > > > Excerpts from Robert Collins's message of 2014-01-24 18:48:41 -0800: > > > > >> > However, in looking at how Ironic works and interacts with Nova, it > > >> > doesn't seem like there is any distinction of data per-compute-node > > >> > inside Ironic. So for this to work, I'd have to run a whole bunch of > > >> > ironic instances, one per compute node. That seems like something we > > >> > don't want to do. > > >> > > >> Huh? > > >> > > > > > > I can't find anything in Ironic that lets you group nodes by anything > > > except chassis. It was not a serious discussion of how the problem would > > > be solved, just a point that without some way to tie ironic nodes to > > > compute-nodes I'd have to run multiple ironics. > > > > I don't understand the point. There is no tie between ironic nodes and > > compute nodes. Why do you want one? > > > > Because sans Ironic, compute-nodes still have physical characteristics > that make grouping on them attractive for things like anti-affinity. I > don't really want my HA instances "not on the same compute node", I want > them "not in the same failure domain". It becomes a way for all > OpenStack workloads to have more granularity than "availability zone".
Yes, and with Ironic, these same characteristics are desirable but are no longer properties of a nova-compute node; they're properties of the hardware which Ironic manages. In principle, the same (hypothetical) failure-domain-aware scheduling could be done if Ironic is exposing the same sort of group awareness, as long as the nova 'ironic" driver is passing that information up to the scheduler in a sane way. In which case, Ironic would need to be representing such information, even if it's not acting on it, which I think is trivial for us to do. > > So if we have all of that modeled in compute-nodes, then when adding > physical hardware to Ironic one just needs to have something to model > the same relationship for each physical hardware node. We don't have to > do it by linking hardware nodes to compute-nodes, but that would be > doable for a first cut without much change to Ironic. > You're trading failure-domain awareness for fault-tolerance in your control plane. by binding hardware to nova-compute. Ironic is designed explicitly to decouple the instances of Ironic (and Nova) within the control plane from the hardware it's managing. This is one of the main shortcomings of nova baremetal, and it doesn't seem like a worthy trade, even for a first approximation. > > >> The changes to Nova would be massive and invasive as they would be > > >> redefining the driver api....and all the logic around it. > > >> > > > > > > I'm not sure I follow you at all. I'm suggesting that the scheduler have > > > a new thing to filter on, and that compute nodes push their unique ID > > > down into the Ironic driver so that while setting up nodes in Ironic one > > > can assign them to a compute node. That doesn't sound massive and > > > invasive. This is already being done *within* Ironic as nodes are mapped dynamically to ironic-conductor instances; the coordination for failover/takeover needs to be improved, but that's incremental at this point. Moving this mapping outside of Ironic is going to be messy and complicated, and breaks the abstraction layer. The API change may seem small, but it will massively overcomplicate Nova by duplicating all the functionality of Ironic-conductor in another layer of the stack. > > > > I think we're perhaps talking about different things - in the section > > you were answering, I thought he was talking about whether the API > > should offer operations on arbitrary sets of nodes at once, or whether > > each operation should be a separate API call vs what I now think you > > were talking about which was whether operations should be able to > > describe logical relations to other instances/nodes. Perhaps if we use > > the term 'batch' rather than 'group' to talk about the > > multiple-things-at-once aspect, and grouping to talk about the > > primarily scheduler related problems of affinity / anti affinity etc, > > we can avoid future confusion. > > > > Yes, thats a good point. I was talking about modeling failure domains > only. Batching API requests seems like an entirely different thing. > I was conflating these terms in that I was talking about "grouping actions" (batching) and "groups of nodes" (groups). That said, there are really three distinct topics here. Let's break groups down further: "logical group" for failure domains, and "hardware group" for hardware which is physically interdependent in such a way that changes to one node affect other node(s). Regards, Deva _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev