re-posting after becoming a member..
I want to post some more details on the issues that Matt has posted earlier on the OVN scaling measurements. The symptom of the problem seem to be same in both Matt's router scaling tests as well as VM scaling tests that we ran on the same environment. In this case we created VMs with two network interfaces one for provider network and another for private network. On private network side, each tenant gets two private networks, each with one subnet and one VM on each network. Once VMs boots-up, it starts iperf traffic between the VMs in the private network. Around 330 VMs we started hitting errors in the rally benchmark, looking through the logs in the system where ovn northd is deployed, I see connection to neutron stopped working. - CONNECTION TO NEUTRON DROPPED 2016-01-28T04:06:31.558Z|00430|ovsdb_file| INFO|/var/lib/openvswitch/ovnnb.db: compacting database online (1453083247.677 seconds old, 4233 transactions, 10503193 bytes) 2016-01-28T04:19:57.769Z|00431|reconnect|ERR|tcp:10.138.7.225:36487: no response to inactivity probe after 5 seconds, disconnecting 2016-01-28T04:20:08.288Z|00432|memory|INFO|peak resident set size grew 145% in last 1341.4 seconds, from 27464 kB to 67328 kB 2016-01-28T04:20:08.288Z|00433|memory|INFO|cells:230931 monitors:190 sessions:190 2016-01-28T04:25:04.472Z|00434|ovsdb_file| INFO|/var/lib/openvswitch/ovnsb.db: compacting database online (1453084360.588 seconds old, 6892 transactions, 10488324 bytes) 2016-01-28T04:27:38.367Z|00435|reconnect|ERR|tcp:10.138.109.44:57841: no response to inactivity probe after 5 seconds, disconnecting 2016-01-28T04:28:36.182Z|00436|reconnect|ERR|tcp:10.138.7.223:55050: no response to inactivity probe after 5 seconds, disconnecting 2016-01-28T04:29:31.554Z|00437|reconnect|ERR|tcp:10.138.109.47:46113: no response to inactivity probe after 5 seconds, disconnecting 2016-01-28T05:15:04.831Z|00493|ovsdb_file| INFO|/var/lib/openvswitch/ovnsb.db: compacting database online (1453047567.212 seconds old, 385 transactions, 12170216 bytes) 2016-01-28T05:15:05.411Z|00494|timeval|WARN|Unreasonably long 1020ms poll interval (996ms user, 21ms system) 2016-01-28T05:15:05.411Z|00495|timeval|WARN|faults: 3845 minor, 0 major 2016-01-28T05:15:05.411Z|00496|timeval|WARN|disk: 0 reads, 23328 writes 2016-01-28T05:15:05.411Z|00497|timeval|WARN|context switches: 208 voluntary, 1 involuntary 2016-01-28T05:15:05.411Z|00498|coverage|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour, hash=a87630a6: 2016-01-28T05:15:05.411Z|00499|coverage|INFO|hmap_pathological 49.4/sec 22.450/sec 44.1908/sec total: 198889 2016-01-28T05:15:05.411Z|00500|coverage|INFO|hmap_expand 4873.2/sec 2320.833/sec 2205.4286/sec total: 21266361 2016-01-28T05:15:05.411Z|00501|coverage|INFO|lockfile_lock 0.0/sec 0.000/sec 0.0019/sec total: 14 2016-01-28T05:15:05.411Z|00502|coverage|INFO|lockfile_unlock 0.0/sec 0.000/sec 0.0022/sec total: 13 2016-01-28T05:15:05.411Z|00503|coverage|INFO|poll_create_node 10204.8/sec 11800.700/sec 12457.8392/sec total: 484911073 2016-01-28T05:15:05.411Z|00504|coverage|INFO|poll_zero_timeout 2.0/sec 4.467/sec 4.0672/sec total: 329223 2016-01-28T05:15:05.411Z|00505|coverage|INFO|seq_change 0.0/sec 0.000/sec 0.0000/sec total: 3 2016-01-28T05:15:05.411Z|00506|coverage|INFO|pstream_open 0.0/sec 0.000/sec 0.0000/sec total: 3 2016-01-28T05:15:05.411Z|00507|coverage|INFO|unixctl_received 0.0/sec 0.000/sec 0.0000/sec total: 7 2016-01-28T05:15:05.411Z|00508|coverage|INFO|unixctl_replied 0.0/sec 0.000/sec 0.0000/sec total: 7 2016-01-28T05:15:05.411Z|00509|coverage|INFO|util_xalloc 761596.4/sec 419504.017/sec 222669.7517/sec total: 1841629427 2016-01-28T05:15:05.411Z|00510|coverage|INFO|5 events never hit Looking at the neutron server log, I see ACL error referential integrity violation. I am still digging into this, let me know if you need these log files. 2016-01-28 04:55:04.723 2757 WARNING requests.packages.urllib3.connectionpool [req-7964fc18-a95c-4464-a568-038a453d006e 7a0b7c6414734a93b5dffbc666534690 e53d6cd40c5140c58ec014eb56070917 - - -] Connection pool is full, discarding connection: identity.open.softlayer.com 2016-01-28 04:55:04.772 2757 WARNING requests.packages.urllib3.connectionpool [-] Connection pool is full, discarding connection: identity.open.softlayer.com 2016-01-28 04:55:05.776 2757 WARNING requests.packages.urllib3.connectionpool [req-6b80241f-0928-4d93-a80e-988a7b3e9690 7a0b7c6414734a93b5dffbc666534690 e53d6cd40c5140c58ec014eb56070917 - - -] Connection pool is full, discarding connection: identity.open.softlayer.com 2016-01-28 04:55:55.380 2757 ERROR neutron.agent.ovsdb.impl_idl [-] OVSDB Error: {"details":"Table Logical_Switch column acls row 36d940b2-26cc-426a-bda6-dd2491f18397 references nonexistent row 0644cffd-71ca-467f-8e1d-6652968870ef in table ACL.","error":"referential integrity violation"} 2016-01-28 04:55:55.443 2757 ERROR neutron.agent.ovsdb.impl_idl [req-0bd44851-ecd9-4957-978f-350b52ada25b cc27c50b17fc4954905db5f3f3eed730 e53d6cd40c5140c58ec014eb56070917 - - -] Traceback (most recent call last): File "/opt/neutron/lib/python2.7/site-packages/neutron/agent/ovsdb/native/connection.py", line 99, in run txn.results.put(txn.do_commit()) File "/opt/neutron/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py", line 106, in do_commit raise RuntimeError(msg) RuntimeError: OVSDB Error: {"details":"Table Logical_Switch column acls row 36d940b2-26cc-426a-bda6-dd2491f18397 references nonexistent row 0644cffd-71ca-467f-8e1d-6652968870ef in table ACL.","error":"referential integrity violation"} 2016-01-28 04:55:55.728 2757 ERROR neutron.api.v2.resource [req-0bd44851-ecd9-4957-978f-350b52ada25b cc27c50b17fc4954905db5f3f3eed730 e53d6cd40c5140c58ec014eb56070917 - - -] create failed 2016-01-28 04:55:55.728 2757 ERROR neutron.api.v2.resource Traceback (most recent call last): 2016-01-28 04:55:55.728 2757 ERROR neutron.api.v2.resource File "/opt/neutron/lib/python2.7/site-packages/neutron/api/v2/resource.py", line 83, in resource Thanks, Mala ---------------------------------------------------------------------------
_______________________________________________ discuss mailing list discuss@openvswitch.org http://openvswitch.org/mailman/listinfo/discuss