Hi,
during robustness testing, where VM:s are booted and deleted using nova
boot/delete in rather rapid succession, VMs get stuck in spawning state
after
a few test cycles. Presumably this is due to the OVS not responding to port
additions and deletions anymore, or rather that responses to these requests
become painfully slow. Other requests towards the vswitchd fail to complete
in any reasonable time frame as well, ovs-appctl vlog/set is one example.
The only conclusion I can draw at the moment is that some thread (I've
observed main and dpdk_watchdog3) is blocking the ovsrcu_synchronize()
operation for "infinite" time and there is no fall-back to get out of
this. To
recover, the minimum operation seems to be a service restart of the
openvswitch-switch service but that seems to cause other issues longer term.
In the vswitch log when this happens the following can be observed:
2016-01-24T20:36:14.601Z|02742|ovs_rcu(vhost_thread2)|WARN|blocked 1000
ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:36:15.600Z|02743|ovs_rcu(vhost_thread2)|WARN|blocked 2000
ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:36:17.601Z|02744|ovs_rcu(vhost_thread2)|WARN|blocked 4000
ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:36:21.600Z|02745|ovs_rcu(vhost_thread2)|WARN|blocked 8000
ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:36:24.511Z|00001|ovs_rcu(urcu1)|WARN|blocked 1000 ms
waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:36:24.846Z|08246|ovs_rcu|WARN|blocked 1000 ms waiting for
dpdk_watchdog3 to quiesce
2016-01-24T20:36:25.511Z|00002|ovs_rcu(urcu1)|WARN|blocked 2000 ms
waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:36:25.846Z|08247|ovs_rcu|WARN|blocked 2000 ms waiting for
dpdk_watchdog3 to quiesce
2016-01-24T20:36:27.510Z|00003|ovs_rcu(urcu1)|WARN|blocked 4000 ms
waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:36:27.847Z|08248|ovs_rcu|WARN|blocked 4000 ms waiting for
dpdk_watchdog3 to quiesce
2016-01-24T20:36:29.600Z|02746|ovs_rcu(vhost_thread2)|WARN|blocked 16000
ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:36:31.510Z|00004|ovs_rcu(urcu1)|WARN|blocked 8000 ms
waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:36:31.846Z|08249|ovs_rcu|WARN|blocked 8000 ms waiting for
dpdk_watchdog3 to quiesce
2016-01-24T20:36:39.511Z|00005|ovs_rcu(urcu1)|WARN|blocked 16000 ms
waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:36:39.846Z|08250|ovs_rcu|WARN|blocked 16000 ms waiting for
dpdk_watchdog3 to quiesce
2016-01-24T20:36:45.600Z|02747|ovs_rcu(vhost_thread2)|WARN|blocked 32000
ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:36:55.510Z|00006|ovs_rcu(urcu1)|WARN|blocked 32000 ms
waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:36:55.846Z|08251|ovs_rcu|WARN|blocked 32000 ms waiting for
dpdk_watchdog3 to quiesce
2016-01-24T20:37:17.600Z|02748|ovs_rcu(vhost_thread2)|WARN|blocked 64000
ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:37:27.510Z|00007|ovs_rcu(urcu1)|WARN|blocked 64000 ms
waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:37:27.846Z|08252|ovs_rcu|WARN|blocked 64000 ms waiting for
dpdk_watchdog3 to quiesce
2016-01-24T20:38:21.601Z|02749|ovs_rcu(vhost_thread2)|WARN|blocked
128000 ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:38:31.511Z|00008|ovs_rcu(urcu1)|WARN|blocked 128000 ms
waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:38:31.846Z|08253|ovs_rcu|WARN|blocked 128000 ms waiting
for dpdk_watchdog3 to quiesce
2016-01-24T20:40:29.600Z|02750|ovs_rcu(vhost_thread2)|WARN|blocked
256000 ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:40:39.510Z|00009|ovs_rcu(urcu1)|WARN|blocked 256000 ms
waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:40:39.846Z|08254|ovs_rcu|WARN|blocked 256000 ms waiting
for dpdk_watchdog3 to quiesce
2016-01-24T20:44:45.601Z|02751|ovs_rcu(vhost_thread2)|WARN|blocked
512000 ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:44:55.510Z|00010|ovs_rcu(urcu1)|WARN|blocked 512000 ms
waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:44:55.846Z|08255|ovs_rcu|WARN|blocked 512000 ms waiting
for dpdk_watchdog3 to quiesce
2016-01-24T20:53:17.600Z|02752|ovs_rcu(vhost_thread2)|WARN|blocked
1024000 ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:53:27.510Z|00011|ovs_rcu(urcu1)|WARN|blocked 1024000 ms
waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:53:27.847Z|08256|ovs_rcu|WARN|blocked 1024000 ms waiting
for dpdk_watchdog3 to quiesce
2016-01-24T21:10:21.600Z|02753|ovs_rcu(vhost_thread2)|WARN|blocked
2048000 ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T21:10:31.510Z|00012|ovs_rcu(urcu1)|WARN|blocked 2048000 ms
waiting for dpdk_watchdog3 to quiesce
2016-01-24T21:10:31.846Z|08257|ovs_rcu|WARN|blocked 2048000 ms waiting
for dpdk_watchdog3 to quiesce
2016-01-24T21:44:29.600Z|02754|ovs_rcu(vhost_thread2)|WARN|blocked
4096000 ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T21:44:39.510Z|00013|ovs_rcu(urcu1)|WARN|blocked 4096000 ms
waiting for dpdk_watchdog3 to quiesce
2016-01-24T21:44:39.846Z|08258|ovs_rcu|WARN|blocked 4096000 ms waiting
for dpdk_watchdog3 to quiesce
2016-01-24T22:52:45.601Z|02755|ovs_rcu(vhost_thread2)|WARN|blocked
8192000 ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T22:52:55.510Z|00014|ovs_rcu(urcu1)|WARN|blocked 8192000 ms
waiting for dpdk_watchdog3 to quiesce
2016-01-24T22:52:55.846Z|08259|ovs_rcu|WARN|blocked 8192000 ms waiting
for dpdk_watchdog3 to quiesce
2016-01-25T01:09:17.600Z|02756|ovs_rcu(vhost_thread2)|WARN|blocked
16384000 ms waiting for dpdk_watchdog3 to quiesce
2016-01-25T01:09:27.511Z|00015|ovs_rcu(urcu1)|WARN|blocked 16384000 ms
waiting for dpdk_watchdog3 to quiesce
2016-01-25T01:09:27.847Z|08260|ovs_rcu|WARN|blocked 16384000 ms waiting
for dpdk_watchdog3 to quiesce
2016-01-25T05:42:21.600Z|02757|ovs_rcu(vhost_thread2)|WARN|blocked
32768000 ms waiting for dpdk_watchdog3 to quiesce
2016-01-25T05:42:31.510Z|00016|ovs_rcu(urcu1)|WARN|blocked 32768000 ms
waiting for dpdk_watchdog3 to quiesce
2016-01-25T05:42:31.846Z|08261|ovs_rcu|WARN|blocked 32768000 ms waiting
for dpdk_watchdog3 to quiesce
Is this a known issue?
This issue can be reproduced by booting multiple VMs on the same
compute. It seems to be much easier to reproduce if each VM has
several vNICs. Then in a loop delete the VMs, for example:
for (( i=1; i<=$vm_count; i++ )); do nova delete "test_vm$i"; done
Regards,
Patrik
_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss