On 19 May 2017 at 10:03, Sylvain Bauza <sba...@redhat.com> wrote: > > > Le 19/05/2017 10:02, Sylvain Bauza a écrit : >> >> >> Le 19/05/2017 02:55, Matt Riedemann a écrit : >>> The etherpad for this session is here [1]. The goal for this session was >>> to inform operators and get feedback on the plan for what we're doing >>> with moving claims from the computes to the control layer (scheduler or >>> conductor). >>> >>> We mostly talked about retries, which also came up in the cells v2 >>> session that Dan Smith led [2] and will recap later. >>> >>> Without getting into too many details, in the cells v2 session we came >>> to a compromise on build retries and said that we could pass hosts down >>> to the cell so that the cell-level conductor could retry if needed (even >>> though we expect doing claims at the top will fix the majority of >>> reasons you'd have a reschedule in the first place). >>> >> >> And during that session, we said that given cell-local conductors (when >> there is a reschedule) can't upcall the global (for all cells) >> schedulers, that's why we agreed to use the conductor to be calling >> Placement API for allocations. >> >> >>> During the claims in the scheduler session, a new wrinkle came up which >>> is the hosts that the scheduler returns to the top-level conductor may >>> be in different cells. So if we have two cells, A and B, with hosts x >>> and y in cell A and host z in cell B, we can't send z to A for retries, >>> or x or y to B for retries. So we need some kind of post-filter/weigher >>> filtering such that hosts are grouped by cell and then they can be sent >>> to the cells for retries as necessary. >>> >> >> That's already proposed for reviews in >> https://review.openstack.org/#/c/465175/ >> >> >>> There was also some side discussion asking if we somehow regressed >>> pack-first strategies by using Placement in Ocata. John Garbutt and Dan >>> Smith have the context on this (I think) so I'm hoping they can clarify >>> if we really need to fix something in Ocata at this point, or is this >>> more of a case of closing a loop-hole? >>> >> >> The problem is that the scheduler doesn't verify the cells when trying >> to find a destination for an instance, it's just using weights for packing. >> >> So, for example, say I have N hosts and 2 cells, the first weighting >> host could be in cell1 while the second could be in cell2. Then, even if >> the operator uses the weighers for packing, for example a RequestSpec >> with num_instances=2 could push one instance in cell1 and the other in >> cell2. >> >> From a scheduler point of view, I think we could possibly add a >> CellWeigher that would help to pack instances within the same cell. >> Anyway, that's not related to the claims series, so we could possibly >> backport it for Ocata hopefully. >> > > Melanie actually made a good point about the current logic based on the > `host_subset_size`config option. If you're leaving it defaulted to 1, in > theory all instances coming along the scheduler would get a sorted list > of hosts by weights and only pick the first one (ie. packing all the > instances onto the same host) which is good for that (except of course > some user request that fits all the space of the host and where a spread > could be better by shuffling between multiple hosts). > > So, while I began deprecating that option because I thought the race > condition would be fixed by conductor claims, I think we should keep it > for the time being until we clearly identify whether it's still necessary. > > All what I said earlier above remains valid tho. In a world where 2 > hosts are given as the less weighed ones, we could send instances from > the same user request onto different cells, but that only ties the > problem to a multi-instance boot problem, which is far less impactful.
FWIW, I think we need to keep this. If you have *lots* of contention when picking your host, increasing host_subset_size should help reduce that contention (and maybe help increase the throughput). I haven't written a simulator to test it out, but it feels like we will still need to keep the fuzzy select. That might just be a different way to say the same thing mel was saying, not sure. Thanks, johnthetubaguy __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev