Hi,
We run a scalability test case to test lport creation and binding.
The git version is 06d4d4b on master branch with a patch to disable alive
probe messages between ovsdb-server and ova-controller.
The test environment is deployed as below:
- a rally node, used to run test case
- a ovn control node, used to run ovn-northd, ovn northbound ovsdb-server,
ovn southbound ovsdb-server
- 11 farm nodes, used to run sandboxes
Each node is a bare-metal with hardware spec:
- Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz x 4, 40 cores totally
- 251G memory
Sandboxes are devided into groups, each group has 50 sandboxes. Both of
farm node 0 and farm node 1 have only one group, each of other farm nodes
has two groups. That is, 1000(20 groups) sandboxes are created on 11
bare-metals.
The test steps as below:
1. Create 1000 sandboxes in bridge mode, that is a additional bridge 'br0'
is created and identified by external-ids:ovn-bridge-mapping.
2. On ovn northbound, create 5 lswitches and create 200 lports for each
lswitch. A additional lport with type 'localnet' is added to each lswitch.
3. For each of lport created in step 2, bind it to one group's sandboxes
randomly, then use ‘ovn-nbctl wait-until Logical_Port <port-name> up=true’
to wait the port is ‘up’ on northbound
4. Goto step 2
Here is a brief test result.
lport 1k lports ovn-northd ovsdb-server ovsdb-server ovnnb.db ovnsb.db
number bind time memory(kB) northbound southbound (kB) (kB)
1000 ??? 6416 294784 8716 372 1519
2000 484.526 9872 1089264 1188 742 2549
3000 594.438 13484 2385536 12476 1111 3578
4000 685.491 17736 4176240 14920 1481 4608
5000 872.705 21704 6476420 17424 1851 5638
6000 958.363 25580 9272100 19844 2220 6668
7000 1142.056 29472 12561300 22268 2590 7698
8000 1258.395 33780 16346944 24676 2960 8728
9000 1446.025 37680 20653952 27184 3330 9757
10000 1567.252 41680 25446148 31808 3699 8364
11000 1800.981 45824 30750804 34248 4069 9394
12000 1940.967 49624 36541272 36408 4439 9873
13000 2117.564 53640 42843712 39108 4808 10681
14000 2231.282 57076 49627496 125672 5178 11465
15000 2448.585 61600 56893928 133864 5548 12271
16000 2614.816 65832 64678388 142184 5918 13040
17000 2839.524 69984 72993472 150816 6287 13831
18000 2952.906 73924 81802688 160484 6657 14630
19000 3143.878 77932 91138948 168676 7027 15444
20000 1529.746 81844 100955012 176868 7397 16233
Details:
- 'lport number' column is the number of lport already created. Each row is
a test result of create and bind 1000 lports.
- '1k lports bind time' column is the total time of bind 1000 lports to
sandboxes, lports are bound one by one. For each lport, the time consists
of by:
- ssh to a farm node, use ovs-vsctl to add lport to 'br-int' and update
Interface table
- ssh to control node, use ovn-nbctl to wait until lport's 'up' column
is 'true' in Logial_Port table
If we create only one sandbox, one lswith and one lport, then bind the
lport to the sandbox, the time is about 100ms.
- 'ovn-northd', 'ovsdb-server northbound' and 'ovsdb-server southdb'
columns are memory usage of these 3 processes, unit is kB.
- 'ovnnb.db' and 'ovnsb.db' columns are file size of DB file, unit in kB.
- The last row show that the lport bind time is half of previous row,
because the last 1000 lports are bound to sandboxes on fram node 0 which
only has 50 sandboxes, while the prervious 1000 lports are bound to farm
node 10 which has 100 sandboxes.
While binding lports to sandbox, it's ovn-controller's cpu usage become
very high for several seconds. After 3k lport bound, the farm node on which
lport are binding to sandboxes, total cpu usage is about 100%. In
production environment this will no happend, each bare-metal only runs one
ovn-controller.
The ovn northbound ovsdb-server's memory usage grows very fast, we are
looking into the problem.
The test case is implemented as a plugin of openstack/rally, it’s at
https://github.com/l8huang/rally
BR
Huang Lei
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev