Thanks Joe, I really appreciate these numbers.
For an individual (cascaded) Neutron, then, your testing showed that it
could happily handle 1000 compute hosts. Apart from the cascading on
the northbound side, was that otherwise unmodified from vanilla
OpenStack? Do you recall any particular config settings that were
needed to achieve that? (e.g. api_workers and rpc_workers)
Regards,
Neil
On 16/04/15 03:03, joehuang wrote:
"In case it's helpful to see all the cases together, sync_routers (from the L3
agent) was also mentioned in other part of this thread. Plus of course the liveness
reporting from all agents."
In the test report [1], which shows Neutron can supports up to million level
ports and 100k level physical hosts, the scalability is done by one cascading
Neutron to manage 100 cascaded Neutrons through current Neutron restful API.
For normal Neutron, each compute node will host L2 agent/OVS, L3 agent/DVR. In
the cascading Neutron layer, the L2 agent is modified to interact with
regarding cascaded Neutron but not OVS, the L3 agent(DVR) is modified to
interact with regarding cascaded Neutron but not linux route. That's why we
call the cascaded Neutron is the backend of Neutron.
Therefore, there are only 100 compute nodes (or say agent ) required in the
cascading layer, each compute node will manage one cascaded Neutron. Each
cascaded Neutron can manage up to 1000 nodes (there is already report and
deployment and lab test can support this). That's the scalability to 100k nodes.
Because the cloud is splited into two layer (100 nodes in the cascading layer,
1000 nodes in each cascaded layer ), even current mechanism can meet the demand
for sync_routers and liveness reporting from all agents, or L2 population, DVR
router update...etc.
The test report [1] at least prove that the layered architecture idea is
feasible for Neutron scalability, even up to million level ports and 100k level
nodes. The extra benefit for the layered architecture is that each cascaded
Neutron can leverage different backend technology implementation, for example,
one is ML2+OVS, another is OVN or ODL or Calico...
[1]test report for million ports scalability of Neutron
http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers
Best Regards
Chaoyi Huang ( Joe Huang )
-----Original Message-----
From: Neil Jerram [mailto:neil.jer...@metaswitch.com]
Sent: Wednesday, April 15, 2015 9:46 PM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
Hi again Joe, (+ list)
On 11/04/15 02:00, joehuang wrote:
Hi, Neil,
See inline comments.
Best Regards
Chaoyi Huang
________________________________________
From: Neil Jerram [neil.jer...@metaswitch.com]
Sent: 09 April 2015 23:01
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
Hi Joe,
Many thanks for your reply!
On 09/04/15 03:34, joehuang wrote:
Hi, Neil,
From theoretic, Neutron is like a "broadcast" domain, for example, enforcement of DVR and
security group has to touch each regarding host where there is VM of this project resides. Even using SDN
controller, the "touch" to regarding host is inevitable. If there are plenty of physical hosts, for
example, 10k, inside one Neutron, it's very hard to overcome the "broadcast storm" issue under
concurrent operation, that's the bottleneck for scalability of Neutron.
I think I understand that in general terms - but can you be more
specific about the broadcast storm? Is there one particular message
exchange that involves broadcasting? Is it only from the server to
agents, or are there 'broadcasts' in other directions as well?
[[joehuang]] for example, L2 population, Security group rule update, DVR route
update. Both direction in different scenario.
Thanks. In case it's helpful to see all the cases together, sync_routers (from
the L3 agent) was also mentioned in other part of this thread. Plus of course
the liveness reporting from all agents.
(I presume you are talking about control plane messages here, i.e.
between Neutron components. Is that right? Obviously there can also
be broadcast storm problems in the data plane - but I don't think
that's what you are talking about here.)
[[joehuang]] Yes, controll plane here.
Thanks for confirming that.
We need layered architecture in Neutron to solve the "broadcast
domain" bottleneck of scalability. The test report from OpenStack
cascading shows that through layered architecture "Neutron
cascading", Neutron can supports up to million level ports and 100k
level physical hosts. You can find the report here:
http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascad
ing-solution-to-support-1-million-v-ms-in-100-data-centers
Many thanks, I will take a look at this.
It was very interesting, thanks. And by following through your links I also
learned more about Nova cells, and about how some people question whether we
need any kind of partitioning at all, and should instead solve
scaling/performance problems in other ways... It will be interesting to see
how this plays out.
I'd still like to see more information, though, about how far people have
scaled OpenStack - and in particular Neutron - as it exists today.
Surely having a consensus set of current limits is an important input into
any discussion of future scaling work.
For example, Kevin mentioned benchmarking where the Neutron server processed a
liveness update in <50ms and a sync_routers in 300ms.
Suppose, the liveness update time was 50ms (since I don't know in detail what that
< means) and agents report liveness every 30s. Does that mean that a single
Neutron server can only support 600 agents?
I'm also especially interested in the DHCP agent, because in Calico we have one
of those on every compute host. We've just run tests which appeared to be
hitting trouble from just 50 compute hosts onwards, and apparently because of
DHCP agent communications. We need to continue looking into that and report
findings properly, but if anyone already has any insights, they would be much
appreciated.
Many thanks,
Neil
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev