We had three design summit sessions related to the new placement service and resource providers work. Since they are all more or less related, I'm going to recap them in a single email.

----

The first session was a retrospective on the placement service work that happened in the Newton release. The full etherpad is here:

https://etherpad.openstack.org/p/ocata-nova-summit-placement-retrospective

We first talked about what went well, of which there were many things:

- There is a better shared understanding of the design and goals among more people in Nova.
- Computes in Newton are reporting their RAM/DISK/CPU inventory and usage.
- We have CI jobs.
- Jay did a nice job of using consistent terminology when discussing resource providers and the end goal for Newton so we could stay focused. - Hangouts helped the team get unstuck at times when we were grinding toward feature freeze. - The placement API has a clean WSGI design and REST interface that others are able to build onto easily.

We then talked about what didn't go so well, which included:

- Confusion around division of labor and when different chunks can be worked in parallel, and by whom. - There was too much time spent on making the specs perfect and we needed to just start writing and reviewing code. This was especially evident when the client side (resource tracker) pieces started getting written that used the placement REST API and required changes to the API. - At times there were key discussions/decisions that were not properly documented/communicated back to the wider team. - There was a breakdown in communication at or after the midcycle about the separate placement DB which led to a revert late in the cycle.
- General burnout and frustration.
- Traps of working on long patch series with little review feedback early in the series or low-latency on reviews leading to wasted time.

From those discussions, we listed what we should keep doing or do differently:

- Write specs with so less low-level detail, but if there is that level of detail, make sure to amend the spec later if there are changes once implemented.
- Use Hangouts when we get stuck.
- Document/communicate decisions/agreements/changes in direction in the mailing list.
- Encourage people to pair up for redundancy.
- Encourage early PoCs before building a long and potentially off the mark patch series.

There was also some general discussion about not moving specs to 'implemented' until the spec is updated after the code is all approved. I was personally not sold on what was proposed for this, since I consider amending specs is like writing documentation and CI tests - if you don't -2 the last change in the series to complete the blueprint, people have little incentive to actually do it and once their code is merged it's very hard to get them to do the ancillary tasks. I'm open to further discussing this idea though in case I missed the point.

----

The next session was about the quantitative side of resource providers. The full etherpad is here:

https://etherpad.openstack.org/p/ocata-nova-summit-resource-providers-quantitative

There were quite a few things in the etherpad and we didn't get to all of them, so this is a recap of what we did talk about.

- Custom resource classes

The code for this is moving along and being reviewed. There will be namespaces on the standard resource classes that nova provides. The resource tracker will create inventory/allocation records for the Ironic nodes. The Ironic inventory records will use the node.resource_class value as the custom resource class.

We still need to figure out what to do about mapping a single flavor to multiple node classes, but it might just be done with extra_specs. There will be upgrade impacts for this, however, if not properly mapped and the scheduler starts using the placement service.

- Microversions

Chris Dent has a patch up to add microversion support to the placement API and it's being reviewed.

- Nested resource providers

Jay has been working on code for this and has a design in mind. Jay and Ed did some whiteboarding in the hall and sorted out their differences on the design and have agreement on the way forward (which is Jay's nesting/tree model).

- Documenting the placement REST API

We didn't get into this at the summit, but in side discussions it's a TODO and right now we'll most likely handle this like we do for the compute api-ref.

- Top priorities for Ocata

1. The scheduler calling the placement API to get a list of resource providers. There are some specs and WIP code up that Sylvain is working on. Note that this is not going to involve the caching scheduler for now, we'll worry about that later.

2. Start handling shared storage. We need the resource tracker and/or an external script to create the resource provider / aggregate mapping and inventory/allocation records against shared DISK_GB inventories. The aggregates mapping modeling work in the placement API is underway.

- What's required when upgrading to Ocata

1. The placement service is required to upgrade to Ocata. You'll break in Ocata if you don't have this because the scheduler will be using the placement service for scheduling decisions. The idea is to stand up the placement service in Newton, get the resource provider (compute node) data populated and then upgrade.

TODO: We need to be more clear about this in the release notes and upgrade docs.

2. The aforementioned mapping of Ironic flavors to multiple node resource classes. This is still a TBD though.

----

The final resource providers session focused on qualitative aspects, which are the traits on a given resource provider. The full session etherpad is here:

https://etherpad.openstack.org/p/ocata-nova-summit-resource-providers-qualitative

The majority of the session was mostly talking about the proposed traits REST API and different use cases, along with some clarification on rules around traits:

- They can't be negative.
- Preferred/required traits will be part of the request spec, not tagged on a trait itself. How this is worked into the request spec is TBD. - Image metadata / flavor extra specs will need to be handled at some point but it's not a top priority right now.
- There will be no ACLs on traits.
- The traits APIs will be admin-only for now.

The direction for Ocata is to:

- Spend less time on the spec and start working on some proof of concept code, especially on the client side to help shape the needs of the REST API. - Create a spec for namespaces on custom traits which will mirror how we handle namespaces for custom resource classes.
- Move the os-traits library under the Compute program wrt governance.

--

Thanks,

Matt Riedemann


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to