On 11/08/14 14:49, Clint Byrum wrote:
Excerpts from Steven Hardy's message of 2014-08-11 11:40:07 -0700:
On Mon, Aug 11, 2014 at 11:20:50AM -0700, Clint Byrum wrote:
Excerpts from Zane Bitter's message of 2014-08-11 08:16:56 -0700:
On 11/08/14 10:46, Clint Byrum wrote:
Right now we're stuck with an update that just doesn't work. It isn't
just about update-failure-recovery, which is coming along nicely, but
it is also about the lack of signals to control rebuild, poor support
for addressing machines as groups, and unacceptable performance in
large stacks.

Are there blueprints/bugs filed for all of these issues?


Convergnce addresses the poor performance for large stacks in general.
We also have this:

https://bugs.launchpad.net/heat/+bug/1306743

Which shows how slow metadata access can get. I have worked on patches
but haven't been able to complete them. We made big strides but we are
at a point where 40 nodes polling Heat every 30s is too much for one CPU

This sounds like the same figure I heard at the design summit; did the DB call optimisation work that Steve Baker did immediately after that not have any effect?

to handle. When we scaled Heat out onto more CPUs on one box by forking
we ran into eventlet issues. We also ran into issues because even with
many processes we can only use one to resolve templates for a single
stack during update, which was also excessively slow.

Related to this, and a discussion we had recently at the TripleO meetup is
this spec I raised today:

https://review.openstack.org/#/c/113296/

It's following up on the idea that we could potentially address (or at
least mitigate, pending the fully convergence-ified heat) some of these
scalability concerns, if TripleO moves from the one-giant-template model
to a more modular nested-stack/provider model (e.g what Tomas has been
working on)

I've not got into enough detail on that yet to be sure if it's acheivable
for Juno, but it seems initially to be complex-but-doable.

I'd welcome feedback on that idea and how it may fit in with the more
granular convergence-engine model.

Can you link to the eventlet/forking issues bug please?  I thought since
bug #1321303 was fixed that multiple engines and multiple workers should
work OK, and obviously that being true is a precondition to expending
significant effort on the nested stack decoupling plan above.


That was the issue. So we fixed that bug, but we never un-reverted
the patch that forks enough engines to use up all the CPU's on a box
by default. That would likely help a lot with metadata access speed
(we could manually do it in TripleO but we tend to push defaults. :)

Right, and we decided we wouldn't because it's wrong to do that to people by default. In some cases the optimal running configuration for TripleO will differ from the friendliest out-of-the-box configuration for Heat users in general, and in those cases - of which this is one - TripleO will need to specify the configuration.

cheers,
Zane.

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to