Greetings again, Mathieu, response inline...

On 01/18/2018 07:24 PM, Mathieu Gagné wrote:
So far, a couple challenges/issues:

We used to have fine grain control over the calls a user could make to
the Nova API:
* os_compute_api:os-aggregates:add_host
* os_compute_api:os-aggregates:remove_host

This means we could make it so our technicians could *ONLY* manage
this aspect of our cloud.
With placement API, it's all or nothing. (and found some weeks ago
that it's hardcoded to the "admin" role)
And you now have to craft your own curl calls and no more UI in
Horizon. (let me know if I missed something regarding the ACL)

I will read about placement API and see with my coworkers how we could
adapt our systems/tools to use placement API instead. (assuming
disable_allocation_ratio_autoset will be implemented)
But ACL is a big concern for us if we go down that path.

OK, I think I may have stumbled upon a possible solution to this that would allow you to keep using the same host aggregate metadata APIs for setting allocation ratios. See below.

While I agree there are very technical/raw solutions to the issue
(like the ones you suggested), please understand that from our side,
this is still a major regression in the usability of OpenStack from an
operator point of view.

Yes, understood.

And it's unfortunate that I feel I now have to play catch up and
explain my concerns about a "fait accompli" that wasn't well
communicated to the operators and wasn't clearly mentioned in the
release notes.
I would have appreciated an email to the ops list explaining the
proposed change and if anyone has concerns/comments about it. I don't
often reply but I feel like I would have this time as this is a major
change for us.

Agree with you. Frankly, I did not realize this would be an issue. Had I known, of course we would have sent a note out about this and consulted with operators ahead of time.

Anyway, on to a possible solution.

For background, please see this bug:

https://bugs.launchpad.net/nova/+bug/1742747

When looking at that bug and the associated patch, I couldn't help but think that perhaps we could just change the default behaviour of the resource tracker when it encounters a nova.conf CONF.cpu_allocation_ratio value of 0.0.

The current behaviour of the nova-compute resource tracker is to follow the policy outlined in the CONF option's documentation: [1]

"From Ocata (15.0.0) this is used to influence the hosts selected by
the Placement API. Note that when Placement is used, the CoreFilter
is redundant, because the Placement API will have already filtered
out hosts that would have failed the CoreFilter.

This configuration specifies ratio for CoreFilter which can be set
per compute node. For AggregateCoreFilter, it will fall back to this
configuration value if no per-aggregate setting is found.

NOTE: This can be set per-compute, or if set to 0.0, the value
set on the scheduler node(s) or compute node(s) will be used
and defaulted to 16.0."

[1] https://github.com/openstack/nova/blob/master/nova/conf/compute.py#L407-L418

What I believe we can do is change the behaviour so that if a 0.0 value is found in the nova.conf file on the nova-compute worker, then instead of defaulting to 16.0, the resource tracker would first look to see if the compute node was associated with a host aggregate that had the "cpu_allocation_ratio" a metadata item. If one was found, then the host aggregate's cpu_allocation_ratio would be used. If not, then the 16.0 default would be used.

What do you think?

Best,
-jay

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Reply via email to