On 10 December 2013 09:55, Tzu-Mainn Chen <tzuma...@redhat.com> wrote: >> > * created as part of undercloud install process
>> By that note I meant, that Nodes are not resources, Resource instances >> run on Nodes. Nodes are the generic pool of hardware we can deploy >> things onto. > > I don't think "resource nodes" is intended to imply that nodes are resources; > rather, it's supposed to > indicate that it's a node where a resource instance runs. It's supposed to > separate it from "management node" > and "unallocated node". So the question is are we looking at /nodes/ that have a /current role/, or are we looking at /roles/ that have some /current nodes/. My contention is that the role is the interesting thing, and the nodes is the incidental thing. That is, as a sysadmin, my hierarchy of concerns is something like: A: are all services running B: are any of them in a degraded state where I need to take prompt action to prevent a service outage [might mean many things: - software update/disk space criticals/a machine failed and we need to scale the cluster back up/too much load] C: are there any planned changes I need to make [new software deploy, feature request from user, replacing a faulty machine] D: are there long term issues sneaking up on me [capacity planning, machine obsolescence] If we take /nodes/ as the interesting thing, and what they are doing right now as the incidental thing, it's much harder to map that onto the sysadmin concerns. If we start with /roles/ then can answer: A: by showing the list of roles and the summary stats (how many machines, service status aggregate), role level alerts (e.g. nova-api is not responding) B: by showing the list of roles and more detailed stats (overall load, response times of services, tickets against services and a list of in trouble instances in each role - instances with alerts against them - low disk, overload, failed service, early-detection alerts from hardware C: probably out of our remit for now in the general case, but we need to enable some things here like replacing faulty machines D: by looking at trend graphs for roles (not machines), but also by looking at the hardware in aggregate - breakdown by age of machines, summary data for tickets filed against instances that were deployed to a particular machine C: and D: are (F) category work, but for all but the very last thing, it seems clear how to approach this from a roles perspective. I've tried to approach this using /nodes/ as the starting point, and after two terrible drafts I've deleted the section. I'd love it if someone could show me how it would work:) >> > * Unallocated nodes >> > >> > This implies an 'allocation' step, that we don't have - how about >> > 'Idle nodes' or something. >> > >> > It can be auto-allocation. I don't see problem with 'unallocated' term. >> >> Ok, it's not a biggy. I do think it will frame things poorly and lead >> to an expectation about how TripleO works that doesn't match how it >> does, but we can change it later if I'm right, and if I'm wrong, well >> it won't be the first time :). >> > > I'm interested in what the distinction you're making here is. I'd rather get > things > defined correctly the first time, and it's very possible that I'm missing a > fundamental > definition here. So we have: - node - a physical general purpose machine capable of running in many roles. Some nodes may have hardware layout that is particularly useful for a given role. - role - a specific workload we want to map onto one or more nodes. Examples include 'undercloud control plane', 'overcloud control plane', 'overcloud storage', 'overcloud compute' etc. - instance - A role deployed on a node - this is where work actually happens. - scheduling - the process of deciding which role is deployed on which node. The way TripleO works is that we defined a Heat template that lays out policy: 5 instances of 'overcloud control plane please', '20 hypervisors' etc. Heat passes that to Nova, which pulls the image for the role out of Glance, picks a node, and deploys the image to the node. Note in particular the order: Heat -> Nova -> Scheduler -> Node chosen. The user action is not 'allocate a Node to 'overcloud control plane', it is 'size the control plane through heat'. So when we talk about 'unallocated Nodes', the implication is that users 'allocate Nodes', but they don't: they size roles, and after doing all that there may be some Nodes that are - yes - unallocated, or have nothing scheduled to them. So... I'm not debating that we should have a list of free hardware - we totally should - I'm debating how we frame it. 'Available Nodes' or 'Undeployed machines' or whatever. I just want to get away from talking about something ([manual] allocation) that we don't offer. -Rob -- Robert Collins <rbtcoll...@hp.com> Distinguished Technologist HP Converged Cloud _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev