Clint,

I think you are categorically dismissing a very real ops challenge of how to 
set correct system limits, and how to adjust them in a running system. I have 
been stung by this challenge repeatedly over the years. As developers we 
*guess* at what a sensible default for a value will be for a limit, but we are 
sometimes wrong. When we are, that guess has a very real, and very negative 
impact on users of production systems. The idea of using one limit for all 
users is idealistic. I’m convinced based on my experience that it's not the 
best approach in practice. What we usually want to do is bump up a limit for a 
single user, or dynamically drop a limit for all users. The problem is that 
very few systems implement limits in a way they can be adjusted while the 
system is running, and very rarely on a per-tenant basis. So yes, I will assert 
that having a quota implementation and the related complexity is justified by 
the ability to adapt limit levels while the system is running.

Think for a moment about the pain that an ops team goes through when they have 
to take a service down that’s affecting thousands or tens of thousands of 
users. We have to send zillions of emails to customers, we need to hold 
emergency change management meetings. We have to answer questions like “why 
didn’t you test for this?” when we did test for it, and it worked fine under 
simulation, but not in a real production environment under this particular 
stimulus. "Why can’t you take the system down in sections to keep the service 
up?" When the answer to all this is “because the developers never put 
themselves in the shoes of the ops team when they designed it.”

Those who know me will attest to the fact that I care deeply about applying the 
KISS principle. The principle guides us to keep our designs as simple as 
possible unless it’s essential to make them more complex. In this case, the 
complexity is justified.

Now if there are production ops teams for large scale systems that argue that 
dynamic limits and per-user overrides are pointless, then I’ll certainly 
reconsider my position.

Adrian

> On Dec 16, 2015, at 4:21 PM, Clint Byrum <cl...@fewbar.com> wrote:
> 
> Excerpts from Fox, Kevin M's message of 2015-12-16 16:05:29 -0800:
>> Yeah, as an op, I've run into a few things that need quota's that just have 
>> basically hardcoded values. heat stacks for example. its a single global in 
>> /etc/heat/heat.conf:max_stacks_per_tenant=100. Instead of being able to 
>> tweak it for just our one project that legitimately has to create over 200 
>> stacks, I had to set it cloud wide and I had to bounce services to do it. 
>> Please don't do that.
>> 
>> Ideally, it would be nice if the quota stuff could be pulled out into its 
>> own shared lib  (oslo?) and shared amongst projects so that they don't have 
>> to spend much effort implementing quota's. Maybe then things that need 
>> quota's that don't currently can more easily get them.
>> 
> 
> You had to change a config value, once, and that's worse than the added
> code complexity and server load that would come from tracking quotas for
> a distributed service?
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to