On 08/13/2014 06:05 PM, Sylvain Bauza wrote: > > Le 13/08/2014 12:21, Sylvain Bauza a écrit : >> >> Le 12/08/2014 22:06, Sylvain Bauza a écrit : >>> >>> Le 12/08/2014 18:54, Nikola Đipanov a écrit : >>>> On 08/12/2014 04:49 PM, Sylvain Bauza wrote: >>>>> (sorry for reposting, missed 2 links...) >>>>> >>>>> Hi Nikola, >>>>> >>>>> Le 12/08/2014 12:21, Nikola Đipanov a écrit : >>>>>> Hey Nova-istas, >>>>>> >>>>>> While I was hacking on [1] I was considering how to approach the fact >>>>>> that we now need to track one more thing (NUMA node utilization) >>>>>> in our >>>>>> resources. I went with - "I'll add it to compute nodes table" >>>>>> thinking >>>>>> it's a fundamental enough property of a compute host that it >>>>>> deserves to >>>>>> be there, although I was considering Extensible Resource Tracker >>>>>> at one >>>>>> point (ERT from now on - see [2]) but looking at the code - it did >>>>>> not >>>>>> seem to provide anything I desperately needed, so I went with >>>>>> keeping it >>>>>> simple. >>>>>> >>>>>> So fast-forward a few days, and I caught myself solving a problem >>>>>> that I >>>>>> kept thinking ERT should have solved - but apparently hasn't, and I >>>>>> think it is fundamentally a broken design without it - so I'd really >>>>>> like to see it re-visited. >>>>>> >>>>>> The problem can be described by the following lemma (if you take >>>>>> 'lemma' >>>>>> to mean 'a sentence I came up with just now' :)): >>>>>> >>>>>> """ >>>>>> Due to the way scheduling works in Nova (roughly: pick a host >>>>>> based on >>>>>> stale(ish) data, rely on claims to trigger a re-schedule), _same >>>>>> exact_ >>>>>> information that scheduling service used when making a placement >>>>>> decision, needs to be available to the compute service when >>>>>> testing the >>>>>> placement. >>>>>> """ >>>>>> >>>>>> This is not the case right now, and the ERT does not propose any >>>>>> way to >>>>>> solve it - (see how I hacked around needing to be able to get >>>>>> extra_specs when making claims in [3], without hammering the DB). The >>>>>> result will be that any resource that we add and needs user supplied >>>>>> info for scheduling an instance against it, will need a buggy >>>>>> re-implementation of gathering all the bits from the request that >>>>>> scheduler sees, to be able to work properly. >>>>> Well, ERT does provide a plugin mechanism for testing resources at the >>>>> claim level. This is the plugin responsibility to implement a test() >>>>> method [2.1] which will be called when test_claim() [2.2] >>>>> >>>>> So, provided this method is implemented, a local host check can be >>>>> done >>>>> based on the host's view of resources. >>>>> >>>>> >>>> Yes - the problem is there is no clear API to get all the needed >>>> bits to >>>> do so - especially the user supplied one from image and flavors. >>>> On top of that, in current implementation we only pass a hand-wavy >>>> 'usage' blob in. This makes anyone wanting to use this in conjunction >>>> with some of the user supplied bits roll their own >>>> 'extract_data_from_instance_metadata_flavor_image' or similar which is >>>> horrible and also likely bad for performance. >>> >>> I see your concern where there is no interface for user-facing >>> resources like flavor or image metadata. >>> I also think indeed that the big 'usage' blob is not a good choice >>> for long-term vision. >>> >>> That said, I don't think as we say in French to throw the bath >>> water... ie. the problem is with the RT, not the ERT (apart the >>> mention of third-party API that you noted - I'll go to it later below) >>>>>> This is obviously a bigger concern when we want to allow users to >>>>>> pass >>>>>> data (through image or flavor) that can affect scheduling, but >>>>>> still a >>>>>> huge concern IMHO. >>>>> And here is where I agree with you : at the moment, ResourceTracker >>>>> (and >>>>> consequently Extensible RT) only provides the view of the resources >>>>> the >>>>> host is knowing (see my point above) and possibly some other resources >>>>> are missing. >>>>> So, whatever your choice of going with or without ERT, your patch [3] >>>>> still deserves it if we want not to lookup DB each time a claim goes. >>>>> >>>>> >>>>>> As I see that there are already BPs proposing to use this IMHO broken >>>>>> ERT ([4] for example), which will surely add to the proliferation of >>>>>> code that hacks around these design shortcomings in what is already a >>>>>> messy, but also crucial (for perf as well as features) bit of Nova >>>>>> code. >>>>> Two distinct implementations of that spec (ie. instances and flavors) >>>>> have been proposed [2.3] [2.4] so reviews are welcome. If you see the >>>>> test() method, it's no-op thing for both plugins. I'm open to comments >>>>> because I have the stated problem : how can we define a limit on >>>>> just a >>>>> counter of instances and flavors ? >>>>> >>>> Will look at these - but none of them seem to hit the issue I am >>>> complaining about, and that is that it will need to consider other >>>> request data for claims, not only data available for on instances. >>>> >>>> Also - the fact that you don't implement test() in flavor ones tells me >>>> that the implementation is indeed racy (but it is racy atm as well) and >>>> two requests can indeed race for the same host, and since no claims are >>>> done, both can succeed. This is I believe (at least in case of single >>>> flavor hosts) unlikely to happen in practice, but you get the idea. >>> >>> Agreed, these 2 patches probably require another iteration, in >>> particular how we make sure that it won't be racy. So I need another >>> run to think about what to test() for these 2 examples. >>> Another patch has to be done for aggregates, but it's still WIP so >>> not mentioned here. >>> >>> Anyway, as discussed during today's meeting, these 2 patches will not >>> be based on ERT because of the risk it goes for Juno so let's scope >>> them out of this thread. >>> >>> >>>>> >>>>>> I propose to revert [2] ASAP since it is still fresh, and see how >>>>>> we can >>>>>> come up with a cleaner design. >>>>>> >>>>>> Would like to hear opinions on this, before I propose the patch tho! >>>>> IMHO, I think the problem is more likely that the regular RT misses >>>>> some >>>>> information for each host so it requires to handle it on a >>>>> case-by-case >>>>> basis, but I don't think ERT either increases complexity or creates >>>>> another issue. >>>>> >>>> RT does not miss info about the host, but about the particular request >>>> which we have to fish out of different places like image_metadata >>>> extra_specs etc, yet - it can't really work without them. This is >>>> definitely a RT issue that is not specific to ERT. >>> >>> +1, I agree with you, that's an API issue for RT : how do we pass out >>> user-defined metrics ? >>> I still need to figure out which kind of usecases are requiring such >>> examples, albeit the NUMA usecase of course. >>> >> >> After a night and morning of thinking about this problem, I'm writing >> down some ideas that could help fixing the problem (although I don't >> think it's a long-term scenario, rather a workaround) : >> >> Considering you need the metadata of the flavor asked by user when >> claiming a resource : >> 1. Compute Manager knows request_spec when building an instance, so it >> could pass out request_spec as paramater (optional or not) to >> rt.instance_claim() in the claim context >> 2. As ERT does, when claiming, it calls the ERT plugin .test() method >> so with request_spec as extra parameter >> 3. On the ERT side, a plugin is responsible for calling (and caching) >> all flavors metadata, or even only the key you need to compare >> 4. Scheduler knows flavors metadata thanks to ERT plugin and blueprint >> isolate-scheduler-db so it makes decisions on the same subset as the RT >> 5. When claiming in ERT plugin test(), it extracts the flavor_id, >> looks at the internal in-memory representation of flavors to get the >> metadata, and calls the NUMA _test_numa_topology() method for checking >> wrt this flavor metadata >> >> >> The idea is that so filters and claims are doing separate decisions on >> the same source of knowledge, here being ERT. >> >> -Sylvain >> >> > > Just made a draft over my thoughts here : > https://review.openstack.org/113936 >
I have commented on the review itself in more detail, but in short - this solution won't work for me, and thus in the general case as well, as 1) we need a way to persist the request data (and not make it a horrible performance bottleneck), as caching won't cut it. 2) We also need a way to use it in every periodic task run. None of this is only ERT problem, they are problems of the RT itself, but as I have said in the thread already - adding a plugin API on top of something that is broken is just asking for all sorts of trouble. Still leaning towards revert really - anything else would be short sighted. I would be happy to help come up with a baseline of things to fix before we can add it back, to avoid deferring it needlesly due to scope creep. N. > -S >> >> >>>> >>>> However, I still see several issues with the current ERT >>>> implementation, >>>> but the one I am getting at here, and why I think we should revert it, >>>> is that we are building a 3rd party plugin API that is tightly coupled >>>> to an obviously flawed internal API (RT and Claims). >>>> >>>> We have no policy AFIK about what guarantees we provide to 3rd party >>>> plugins, but if we are going to be breaking them all the time, or in >>>> this case providing very limited usefulness - I see little value in the >>>> current implementation of ERT, and see issues with it staying, which >>>> means future work will be based on it. >>>> >>>> N. >>> >>> *This* is to me IMHO the biggest problem with ERT : if we say that we >>> externalize an API for 3rd-party plugins, we need to make sure that >>> everything is handled. >>> >>> That said, instead of reverting the whole patch, could we just >>> consider to only accept changes that wouldn't require user-facing >>> metrics ? >>> The existing VCPU implementation still sounds good to me, so we can >>> just consider a clear communication on what is acceptable and what >>> not (ie. a docstring in the plugin API or so., plus -2/-1 reviews if >>> needed) >>> >>> -Sylvain >>> >>>>> Thanks, >>>>> -Sylvain >>>>> >>>>>> Thanks all, >>>>>> >>>>>> Nikola >>>>>> >>>>>> [1] >>>>>> https://blueprints.launchpad.net/nova/+spec/virt-driver-numa-placement >>>>>> >>>>>> [2] https://review.openstack.org/#/c/109643/ >>>>>> [3] https://review.openstack.org/#/c/111782/ >>>>>> [4] https://review.openstack.org/#/c/89893 >>>>>> >>>>>> _______________________________________________ >>>>>> OpenStack-dev mailing list >>>>>> OpenStack-dev@lists.openstack.org >>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>>> [2.1] >>>>> https://github.com/openstack/nova/blob/master/nova/compute/resources/__init__.py#L75 >>>>> >>>>> >>>>> [2.2] >>>>> https://github.com/openstack/nova/blob/master/nova/compute/claims.py#L134 >>>>> >>>>> [2.3] https://review.openstack.org/112578 >>>>> [2.4] https://review.openstack.org/113373 >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> OpenStack-dev mailing list >>>>> OpenStack-dev@lists.openstack.org >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>> >>>> _______________________________________________ >>>> OpenStack-dev mailing list >>>> OpenStack-dev@lists.openstack.org >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >>> >>> _______________________________________________ >>> OpenStack-dev mailing list >>> OpenStack-dev@lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> >> _______________________________________________ >> OpenStack-dev mailing list >> OpenStack-dev@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev