On 08/26/2014 04:04 PM, Ian Wienand wrote: > I'm having a hard time getting the description in [1] to trigger after > trying several different approaches.
A huge thank-you to jeblair for getting the logs out and a day of analysis. We can see things going crazy [1] with negative allocations. --- nodepool.NodePool: <AllocationRequest for 248.0 of bare-precise> nodepool.NodePool: <AllocationSubRequest for -116.776699029 (out of 248.0) of bare-precise from hpcloud-b3> nodepool.NodePool: <AllocationSubRequest for -109.553398058 (out of 248.0) of bare-precise from hpcloud-b2> nodepool.NodePool: <AllocationSubRequest for -71.0291262136 (out of 248.0) of bare-precise from rax-iad> nodepool.NodePool: <AllocationSubRequest for -115.572815534 (out of 248.0) of bare-precise from hpcloud-b1> nodepool.NodePool: <AllocationSubRequest for -198.640776699 (out of 248.0) of bare-precise from rax-dfw> nodepool.NodePool: <AllocationSubRequest for 1018.48543689 (out of 248.0) of bare-precise from hpcloud-b5> nodepool.NodePool: <AllocationSubRequest for -44.5436893204 (out of 248.0) of bare-precise from rax-ord> nodepool.NodePool: <AllocationSubRequest for -114.368932039 (out of 248.0) of bare-precise from hpcloud-b4> --- I traced through, from a simple example, what is happening with the allocator when it is getting negative values in the comment of [2]. This explains what happens in the included test-case. I feel it explains why the history-tracking got itself into this situation; because it promotes small allocations it was vastly over-allocating. It would just keep getting worse as more-and-more nodes started to fail. The extant allocator doesn't notice this, especially in the current busy environment, because it would generally result in already over-capacity requests being larger. It could still over-allocate (the test-case shows this) but because smaller allocations don't get any preferential treatment, it doesn't become a run-away problem. Thus I do not think the existing change really needs updating [3], other than being rebased on the fix (which it is) -i [1] http://nodepool.openstack.org/allocator-failure.log [2] https://review.openstack.org/#/c/109185/ [3] https://review.openstack.org/#/c/109890/ _______________________________________________ OpenStack-Infra mailing list OpenStack-Infra@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra