Writing from my phone... May I ask that before you proceed with any plan that uses traits for state information that we have a hangout or videoconference to discuss this? Unfortunately today and tomorrow I'm not able to do a hangout but I can do one on Wednesday any time of the day.
Lemme know! -jay On Oct 23, 2017 5:01 AM, "Dmitry Tantsur" <dtant...@redhat.com> wrote: > Hi Jay! > > I appreciate your comments, but I think you're approaching the problem > from purely VM point of view. Things simply don't work the same way in bare > metal, at least not if we want to provide the same user experience. > > On Sun, Oct 22, 2017 at 2:25 PM, Jay Pipes <jaypi...@gmail.com> wrote: > >> Sorry for delay, took a week off before starting a new job. Comments >> inline. >> >> On 10/16/2017 12:24 PM, Dmitry Tantsur wrote: >> >>> Hi all, >>> >>> I promised John to dump my thoughts on traits to the ML, so here we go :) >>> >>> I see two roles of traits (or kinds of traits) for bare metal: >>> 1. traits that say what the node can do already (e.g. "the node is >>> doing UEFI boot") >>> 2. traits that say what the node can be *configured* to do (e.g. "the >>> node can >>> boot in UEFI mode") >>> >> >> There's only one role for traits. #2 above. #1 is state information. >> Traits are not for state information. Traits are only for communicating >> capabilities of a resource provider (baremetal node). >> > > These are not different, that's what I'm talking about here. No users care > about the difference between "this node was put in UEFI mode by an operator > in advance", "this node was put in UEFI mode by an ironic driver on demand" > and "this node is always in UEFI mode, because it's AARCH64 and it does not > have BIOS". These situation produce the same result (the node is booted in > UEFI mode), and thus it's up to ironic to hide this difference. > > My suggestion with traits is one way to do it, I'm not sure what you > suggest though. > > >> >> For example, let's say we add the following to the os-traits library [1] >> >> * STORAGE_RAID_0 >> * STORAGE_RAID_1 >> * STORAGE_RAID_5 >> * STORAGE_RAID_6 >> * STORAGE_RAID_10 >> >> The Ironic administrator would add all RAID-related traits to the >> baremetal nodes that had the *capability* of supporting that particular >> RAID setup [2] >> >> When provisioned, the baremetal node would either have RAID configured in >> a certain level or not configured at all. >> > >> A very important note: the Placement API and Nova scheduler (or future >> Ironic scheduler) doesn't care about this. At all. I know it sounds like >> I'm being callous, but I'm not. Placement and scheduling doesn't care about >> the state of things. It only cares about the capabilities of target >> destinations. That's it. >> > > Yes, because VMs always start with a clean state, and hypervisor is there > to ensure that. We don't have this luxury in ironic :) E.g. our SNMP driver > is not even aware of boot modes (or RAID, or BIOS configuration), which > does not mean that a node using it cannot be in UEFI mode (have a RAID or > BIOS pre-configured, etc, etc). > > >> >> This seems confusing, but it's actually very useful. Say, I have a flavor >>> that >>> requests UEFI boot via a trait. It will match both the nodes that are >>> already in >>> UEFI mode, as well as nodes that can be put in UEFI mode. >>> >> >> No :) It will only match nodes that have the UEFI capability. The set of >> providers that have the ability to be booted via UEFI is *always* a >> superset of the set of providers that *have been booted via UEFI*. >> Placement and scheduling decisions only care about that superset -- the >> providers with a particular capability. >> > > Well, no, it will. Again, you're purely basing on the VM idea, where a VM > is always *put* in UEFI mode, no matter how the hypervisor looks like. It > is simply not the case for us. You have to care what state the node is, > because many drivers cannot change this state. > > >> >> This idea goes further with deploy templates (new concept we've been >>> thinking >>> about). A flavor can request something like CUSTOM_RAID_5, and it will >>> match the >>> nodes that already have RAID 5, or, more interestingly, the nodes on >>> which we >>> can build RAID 5 before deployment. The UEFI example above can be >>> treated in a >>> similar way. >>> >>> This ends up with two sources of knowledge about traits in ironic: >>> 1. Operators setting something they know about hardware ("this node is >>> in UEFI >>> mode"), >>> 2. Ironic drivers reporting something they >>> 2.1. know about hardware ("this node is in UEFI mode" - again) >>> 2.2. can do about hardware ("I can put this node in UEFI mode") >>> >> >> You're correct that both pieces of information are important. However, >> only the "can do about hardware" part is relevant to Placement and Nova. >> >> For case #1 we are planning on a new CRUD API to set/unset traits for a >>> node. >>> >> >> I would *strongly* advise against this. Traits are not for state >> information. >> >> Instead, consider having a DB (or JSON) schema that lists state >> information in fields that are explicitly for that state information. >> >> For example, a schema that looks like this: >> >> { >> "boot": { >> "mode": <one of 'bios' or 'uefi'>, >> "params": <dict> >> }, >> "disk": { >> "raid": { >> "level": <int>, >> "controller": <one of 'sw' or 'hw'>, >> "driver": <string>, >> "params": <dict> >> }, ... >> }, >> "network": { >> ... >> } >> } >> >> etc, etc. >> >> Don't use trait strings to represent state information. >> > > I don't see an alternative proposal that will satisfy what we have to > solve. > > >> >> Best, >> -jay >> >> Case #2 is more interesting. We have two options, I think: >>> >>> a) Operators still set traits on nodes, drivers are simply validating >>> them. E.g. >>> an operators sets CUSTOM_RAID_5, and the node's RAID interface checks if >>> it is >>> possible to do. The downside is obvious - with a lot of deploy templates >>> available it can be a lot of manual work. >>> >>> b) Drivers report the traits, and they get somehow added to the traits >>> provided >>> by an operator. Technically, there are sub-cases again: >>> b.1) The new traits API returns a union of operator-provided and >>> driver-provided traits >>> b.2) The new traits API returns only operator-provided traits; >>> driver-provided >>> traits are returned e.g. via a new field (node.driver_traits). Then nova >>> will >>> have to merge the lists itself. >>> >>> My personal favorite is the last option: I'd like a clear distinction >>> between >>> different "sources" of traits, but I'd also like to reduce manual work >>> for >>> operators. >>> >>> A valid counter-argument is: what if an operator wants to override a >>> driver-provided trait? E.g. a node can do RAID 5, but I don't want this >>> particular node to do it for any reason. I'm not sure if it's a valid >>> case, and >>> what to do about it. >>> >>> Let me know what you think. >>> >>> Dmitry >>> >> >> [1] http://git.openstack.org/cgit/openstack/os-traits/tree/ >> [2] Based on how many attached disks the node had, the presence and >> abilities of a hardware RAID controller, etc >> >> >> ____________________________________________________________ >> ______________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscrib >> e >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev