Hi,

during robustness testing, where VM:s are booted and deleted using nova
boot/delete in rather rapid succession, VMs get stuck in spawning state after
a few test cycles. Presumably this is due to the OVS not responding to port
additions and deletions anymore, or rather that responses to these requests
become painfully slow. Other requests towards the vswitchd fail to complete
in any reasonable time frame as well, ovs-appctl vlog/set is one example.

The only conclusion I can draw at the moment is that some thread (I've
observed main and dpdk_watchdog3) is blocking the ovsrcu_synchronize()
operation for "infinite" time and there is no fall-back to get out of this. To
recover, the minimum operation seems to be a service restart of the
openvswitch-switch service but that seems to cause other issues longer term.

In the vswitch log when this happens the following can be observed:

2016-01-24T20:36:14.601Z|02742|ovs_rcu(vhost_thread2)|WARN|blocked 1000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:36:15.600Z|02743|ovs_rcu(vhost_thread2)|WARN|blocked 2000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:36:17.601Z|02744|ovs_rcu(vhost_thread2)|WARN|blocked 4000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:36:21.600Z|02745|ovs_rcu(vhost_thread2)|WARN|blocked 8000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:36:24.511Z|00001|ovs_rcu(urcu1)|WARN|blocked 1000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:36:24.846Z|08246|ovs_rcu|WARN|blocked 1000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:36:25.511Z|00002|ovs_rcu(urcu1)|WARN|blocked 2000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:36:25.846Z|08247|ovs_rcu|WARN|blocked 2000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:36:27.510Z|00003|ovs_rcu(urcu1)|WARN|blocked 4000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:36:27.847Z|08248|ovs_rcu|WARN|blocked 4000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:36:29.600Z|02746|ovs_rcu(vhost_thread2)|WARN|blocked 16000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:36:31.510Z|00004|ovs_rcu(urcu1)|WARN|blocked 8000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:36:31.846Z|08249|ovs_rcu|WARN|blocked 8000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:36:39.511Z|00005|ovs_rcu(urcu1)|WARN|blocked 16000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:36:39.846Z|08250|ovs_rcu|WARN|blocked 16000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:36:45.600Z|02747|ovs_rcu(vhost_thread2)|WARN|blocked 32000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:36:55.510Z|00006|ovs_rcu(urcu1)|WARN|blocked 32000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:36:55.846Z|08251|ovs_rcu|WARN|blocked 32000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:37:17.600Z|02748|ovs_rcu(vhost_thread2)|WARN|blocked 64000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:37:27.510Z|00007|ovs_rcu(urcu1)|WARN|blocked 64000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:37:27.846Z|08252|ovs_rcu|WARN|blocked 64000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:38:21.601Z|02749|ovs_rcu(vhost_thread2)|WARN|blocked 128000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:38:31.511Z|00008|ovs_rcu(urcu1)|WARN|blocked 128000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:38:31.846Z|08253|ovs_rcu|WARN|blocked 128000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:40:29.600Z|02750|ovs_rcu(vhost_thread2)|WARN|blocked 256000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:40:39.510Z|00009|ovs_rcu(urcu1)|WARN|blocked 256000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:40:39.846Z|08254|ovs_rcu|WARN|blocked 256000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:44:45.601Z|02751|ovs_rcu(vhost_thread2)|WARN|blocked 512000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:44:55.510Z|00010|ovs_rcu(urcu1)|WARN|blocked 512000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:44:55.846Z|08255|ovs_rcu|WARN|blocked 512000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:53:17.600Z|02752|ovs_rcu(vhost_thread2)|WARN|blocked 1024000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:53:27.510Z|00011|ovs_rcu(urcu1)|WARN|blocked 1024000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T20:53:27.847Z|08256|ovs_rcu|WARN|blocked 1024000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T21:10:21.600Z|02753|ovs_rcu(vhost_thread2)|WARN|blocked 2048000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T21:10:31.510Z|00012|ovs_rcu(urcu1)|WARN|blocked 2048000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T21:10:31.846Z|08257|ovs_rcu|WARN|blocked 2048000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T21:44:29.600Z|02754|ovs_rcu(vhost_thread2)|WARN|blocked 4096000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T21:44:39.510Z|00013|ovs_rcu(urcu1)|WARN|blocked 4096000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T21:44:39.846Z|08258|ovs_rcu|WARN|blocked 4096000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T22:52:45.601Z|02755|ovs_rcu(vhost_thread2)|WARN|blocked 8192000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T22:52:55.510Z|00014|ovs_rcu(urcu1)|WARN|blocked 8192000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-24T22:52:55.846Z|08259|ovs_rcu|WARN|blocked 8192000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-25T01:09:17.600Z|02756|ovs_rcu(vhost_thread2)|WARN|blocked 16384000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-25T01:09:27.511Z|00015|ovs_rcu(urcu1)|WARN|blocked 16384000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-25T01:09:27.847Z|08260|ovs_rcu|WARN|blocked 16384000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-25T05:42:21.600Z|02757|ovs_rcu(vhost_thread2)|WARN|blocked 32768000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-25T05:42:31.510Z|00016|ovs_rcu(urcu1)|WARN|blocked 32768000 ms waiting for dpdk_watchdog3 to quiesce 2016-01-25T05:42:31.846Z|08261|ovs_rcu|WARN|blocked 32768000 ms waiting for dpdk_watchdog3 to quiesce


Is this a known issue?

This issue can be reproduced by booting multiple VMs on the same
compute. It seems to be much easier to reproduce if each VM has
several vNICs. Then in a loop delete the VMs, for example:

for (( i=1; i<=$vm_count; i++ )); do nova delete "test_vm$i"; done

Regards,

Patrik

_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss

Reply via email to