** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1813789
Title:
Evacuate test intermittently fails with network-vif-plugged timeout
exception
Status in OpenStack Compute (nova):
Fix Released
Bug description:
The nova-live-migration job has 2 nodes and in addition to running
live migration tests it also runs evacuate tests for image-backed and
volume-backed servers.
Seeing intermittent failures, with debug in this change:
https://review.openstack.org/#/c/571325/
It looks like the network-vif-plugged event is coming to the API
before nova-compute has registered a callback for that event, so nova-
compute does not process the event and then times out waiting for the
event to complete by the time it registered.
The API processes the network-vif-plugged for that server here:
http://logs.openstack.org/25/571325/10/gate/nova-live-
migration/52e1cd0/logs/screen-n-api.txt.gz#_Jan_29_01_11_49_707004
Jan 29 01:11:49.707004 ubuntu-xenial-rax-ord-0002201755
[email protected][22319]: DEBUG nova.api.openstack.wsgi [req-
3ed0ada1-7328-4d5a-a3ea-da34dcdb252d req-1f5aeede-
83c6-44d5-afd4-b5435e71d61f service nova] Action: 'create', calling
method: <bound method ServerExternalEventsController.create of
<nova.api.openstack.compute.server_external_events.ServerExternalEventsController
object at 0x7fb0133aa050>>, body: {"events": [{"status": "completed",
"tag": "e241f79f-fb0d-4961-b0c8-aea9de2755bf", "name": "network-vif-
plugged", "server_uuid": "2e82ddcd-75b8-4a41-8ecd-ce175adbdc67"}]}
{{(pid=22322) _process_stack
/opt/stack/new/nova/nova/api/openstack/wsgi.py:520}}
Jan 29 01:11:49.765687 ubuntu-xenial-rax-ord-0002201755
[email protected][22319]: INFO
nova.api.openstack.compute.server_external_events
[req-3ed0ada1-7328-4d5a-a3ea-da34dcdb252d
req-1f5aeede-83c6-44d5-afd4-b5435e71d61f service nova] Creating event
network-vif-plugged:e241f79f-fb0d-4961-b0c8-aea9de2755bf for instance
2e82ddcd-75b8-4a41-8ecd-ce175adbdc67 on ubuntu-xenial-rax-ord-0002201758
Jan 29 01:11:49.776578 ubuntu-xenial-rax-ord-0002201755
[email protected][22319]: DEBUG nova.compute.api
[req-3ed0ada1-7328-4d5a-a3ea-da34dcdb252d
req-1f5aeede-83c6-44d5-afd4-b5435e71d61f service nova] Instance
2e82ddcd-75b8-4a41-8ecd-ce175adbdc67 is migrating, copying events to all
relevant hosts: set([u'ubuntu-xenial-rax-ord-0002201758',
u'ubuntu-xenial-rax-ord-0002201755']) {{(pid=22322) _get_relevant_hosts
/opt/stack/new/nova/nova/compute/api.py:4818}}
Jan 29 01:11:49.786947 ubuntu-xenial-rax-ord-0002201755
[email protected][22319]: INFO nova.api.openstack.requestlog
[req-3ed0ada1-7328-4d5a-a3ea-da34dcdb252d
req-1f5aeede-83c6-44d5-afd4-b5435e71d61f service nova] 10.210.224.40 "POST
/compute/v2.1/os-server-external-events" status: 200 len: 183 microversion: 2.1
time: 0.084393
Which is weird because that's before the vif plugging timeout in the
compute logs. From the compute logs, we plugged the vif here:
Jan 29 01:11:55.571807 ubuntu-xenial-rax-ord-0002201755 nova-
compute[15903]: INFO os_vif [None req-252304a1-6eff-4ff3-aa4d-
b4e0ab87601c demo admin] Successfully plugged vif
VIFOpenVSwitch(active=False,address=fa:16:3e:66:03:76,bridge_name='br-
int',has_traffic_filtering=True,id=e241f79f-
fb0d-4961-b0c8-aea9de2755bf,network=Network(2812d9e2-bfa8-4397-9a81-5dab9fe5a03e),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name
='tape241f79f-fb')
So it looks like maybe we're getting the network-vif-plugged event
before we're ready for it and so we miss the event and then timeout?
That's pretty weird though because the libvirt driver's spawn method
is what should be registering and waiting for the vif plugged event:
https://github.com/openstack/nova/blob/c134feda3d9527dbc9735e4ae9cd35c4782f1fb4/nova/virt/libvirt/driver.py#L5673
Looks like the compute does register the event callback for network-
vif-plugged here:
Jan 29 01:11:55.567189 ubuntu-xenial-rax-ord-0002201755 nova-
compute[15903]: DEBUG nova.compute.manager [None req-252304a1-6eff-
4ff3-aa4d-b4e0ab87601c demo admin] [instance: 2e82ddcd-75b8-4a41-8ecd-
ce175adbdc67] Preparing to wait for external event network-vif-
plugged-e241f79f-fb0d-4961-b0c8-aea9de2755bf {{(pid=15903)
prepare_for_instance_event
/opt/stack/new/nova/nova/compute/manager.py:325}}
Which is too late:
Jan 29 01:11:49.707004 ubuntu-xenial-rax-ord-0002201755
[email protected][22319]: DEBUG nova.api.openstack.wsgi [req-
3ed0ada1-7328-4d5a-a3ea-da34dcdb252d req-1f5aeede-
83c6-44d5-afd4-b5435e71d61f service nova] Action: 'create', calling
method: <bound method ServerExternalEventsController.create of
<nova.api.openstack.compute.server_external_events.ServerExternalEventsController
object at 0x7fb0133aa050>>, body: {"events": [{"status": "completed",
"tag": "e241f79f-fb0d-4961-b0c8-aea9de2755bf", "name": "network-vif-
plugged", "server_uuid": "2e82ddcd-75b8-4a41-8ecd-ce175adbdc67"}]}
{{(pid=22322) _process_stack
/opt/stack/new/nova/nova/api/openstack/wsgi.py:520}}
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Timeout%20waiting%20for%20%5B
('network-vif-
plugged'%5C%22%20AND%20message%3A%5C%22for%20instance%20with%20vm_state%20error%20and%20task_state%20rebuild_spawning.%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22&from=7d
4 hits in the last 7 days, check and gate, all failures.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1813789/+subscriptions
--
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help : https://help.launchpad.net/ListHelp