Re: [openstack-dev] [neutron] Neutron scaling datapoints?

joehuang Sun, 12 Apr 2015 19:22:58 -0700

Hi, Kevin and Joshua,

As my understanding, Tooz only addresses the issue of agent status management, 
but how to solve the concurrent dynamic load impact on large scale ( for 
example 100k managed nodes with the dynamic load like security goup rule 
update, routers_updated, etc )

And one more question is, if we have 100k managed nodes, how to do the 
partition? Or all nodes will be managed by one Tooz service, like Zookeeper? 
Can Zookeeper manage 100k nodes status?

Best Regards
Chaoyi Huang ( Joe Huang )

From: Kevin Benton [mailto:[email protected]]
Sent: Monday, April 13, 2015 3:52 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?

>Timestamps are just one way (and likely the most primitive), using redis (or 
>memcache) key/value and expiry are another (and letting memcache or redis 
>expire using its own internal algorithms), using zookeeper ephemeral nodes[1] 
>are another... The point being that its backend specific and tooz supports 
>varying backends.

Very cool. Is the backend completely transparent so a deployer could choose a 
service they are comfortable maintaining, or will that change the properties 
WRT to resiliency of state on node restarts, partitions, etc?

The Nova implementation of Tooz seemed pretty straight-forward, although it 
looked like it had pluggable drivers for service management already. Before I 
dig into it much further I'll file a spec on the Neutron side to see if I can 
get some other cores onboard to do the review work if I push a change to tooz.

On Sun, Apr 12, 2015 at 9:38 AM, Joshua Harlow 
<[email protected]<mailto:[email protected]>> wrote:
Kevin Benton wrote:
So IIUC tooz would be handling the liveness detection for the agents.
That would be nice to get ride of that logic in Neutron and just
register callbacks for rescheduling the dead.

Where does it store that state, does it persist timestamps to the DB
like Neutron does? If so, how would that scale better? If not, who does
a given node ask to know if an agent is online or offline when making a
scheduling decision?

Timestamps are just one way (and likely the most primitive), using redis (or 
memcache) key/value and expiry are another (and letting memcache or redis 
expire using its own internal algorithms), using zookeeper ephemeral nodes[1] 
are another... The point being that its backend specific and tooz supports 
varying backends.

However, before (what I assume is) the large code change to implement
tooz, I would like to quantify that the heartbeats are actually a
bottleneck. When I was doing some profiling of them on the master branch
a few months ago, processing a heartbeat took an order of magnitude less
time (<50ms) than the 'sync routers' task of the l3 agent (~300ms). A
few query optimizations might buy us a lot more headroom before we have
to fall back to large refactors.

Sure, always good to avoid prematurely optimizing things...

Although this is relevant for u I think anyway:

https://review.openstack.org/#/c/138607/ (same thing/nearly same in nova)...

https://review.openstack.org/#/c/172502/ (a WIP implementation of the latter).

[1] 
https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#Ephemeral+Nodes

Kevin Benton wrote:

    One of the most common is the heartbeat from each agent. However, I
    don't think we can't eliminate them because they are used to determine
    if the agents are still alive for scheduling purposes. Did you have
    something else in mind to determine if an agent is alive?

Put each agent in a tooz[1] group; have each agent periodically
heartbeat[2], have whoever needs to schedule read the active members of
that group (or use [3] to get notified via a callback), profit...

Pick from your favorite (supporting) driver at:

http://docs.openstack.org/__developer/tooz/compatibility.__html
<http://docs.openstack.org/developer/tooz/compatibility.html>

[1]
http://docs.openstack.org/__developer/tooz/compatibility.__html#grouping
<http://docs.openstack.org/developer/tooz/compatibility.html#grouping>
[2]
https://github.com/openstack/__tooz/blob/0.13.1/tooz/__coordination.py#L315
<https://github.com/openstack/tooz/blob/0.13.1/tooz/coordination.py#L315>
[3]
http://docs.openstack.org/__developer/tooz/tutorial/group___membership.html#watching-__group-changes
<http://docs.openstack.org/developer/tooz/tutorial/group_membership.html#watching-group-changes>

______________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request@lists.__openstack.org?subject:__unsubscribe<http://openstack.org?subject:__unsubscribe>
<http://[email protected]?subject:unsubscribe>
http://lists.openstack.org/__cgi-bin/mailman/listinfo/__openstack-dev
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
[email protected]?subject:unsubscribe<http://[email protected]?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
[email protected]?subject:unsubscribe<http://[email protected]?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Kevin Benton

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron] Neutron scaling datapoints?

Reply via email to