** Description changed:

  * During periodic task _heal_instance_info_cache the instance_info_caches are 
not updated using instance port_ids taken from neutron, but from nova db.
  * This causes that existing VMs to loose their network interfaces after 
  [Test Plan]
  * This bug is reproducible on Bionic/Queens clouds.
  1) Deploy the following Juju bundle: https://paste.ubuntu.com/p/HgsqZfsDGh/
  2) Run the following script: https://paste.ubuntu.com/p/c4VDkqyR2z/
  3) If the script finishes with "Port not found" , the bug is still present.
  [Where problems could occur]
  Instances created prior to the Openstack Newton release that have more
  than one interface will not have associated information in the
  virtual_interfaces table that is required to repopulate the cache with
  interfaces in the same order they were attached prior. In the unlikely
  event that this occurs and you are using Openstack release Queen or
  Rocky, it will be necessary to either manually populate this table.
  Openstack Stein has a patch that adds support for generating this data.
  Since as things stand the guest will be unable to identify it's network
  information at all in the event the cache gets purged and given the
  hopefully low risk that a vm was created prior to Newton we hope the
  potential for this regression is very low.
+ [Discussion]
+ SRU team, please review the most recent version of nova       
2:17.0.13-0ubuntu3 in the unapproved queue. The older version can be rejected.
  During periodic task _heal_instance_info_cache the
  instance_info_caches are not updated using instance port_ids taken
  from neutron, but from nova db.
  Sometimes, perhaps because of some race-condition, its possible to
  lose some ports from instance_info_caches. Periodic task
  _heal_instance_info_cache should clean this up (add missing records),
  but in fact it's not working this way.
  How it looks now?
  _heal_instance_info_cache during crontask:
  is using network_api to get instance_nw_info (instance_info_caches):
-           try:
-               # Call to network API to get instance info.. this will
-               # force an update to the instance's info_cache
-               self.network_api.get_instance_nw_info(context, instance)
+           try:
+               # Call to network API to get instance info.. this will
+               # force an update to the instance's info_cache
+               self.network_api.get_instance_nw_info(context, instance)
  self.network_api.get_instance_nw_info() is listed below:
  and it uses _build_network_info_model() without networks and port_ids
  parameters (because we're not adding any new interface to instance):
  Next: _gather_port_ids_and_networks() generates the list of instance
  networks and port_ids:
-     networks, port_ids = self._gather_port_ids_and_networks(
-               context, instance, networks, port_ids, client)
+     networks, port_ids = self._gather_port_ids_and_networks(
+               context, instance, networks, port_ids, client)
  As we see that _gather_port_ids_and_networks() takes the port list
  from DB:
  And thats it. When we lose a port its not possible to add it again with this 
periodic task.
  The only way is to clean device_id field in neutron port object and re-attach 
the interface using `nova interface-attach`.
  When the interface is missing and there is no port configured on
  compute host (for example after compute reboot) - interface is not
  added to instance and from neutron point of view port state is DOWN.
  When the interface is missing in cache and we reboot hard the instance
  - its not added as tapinterface in xml file = we don't have the
  network on host.
  Steps to reproduce
  1. Spawn devstack
  2. Spawn VM inside devstack with multiple ports (for example also from 2 
different networks)
  3. Update the DB row, drop one interface from interfaces_list
  4. Hard-Reboot the instance
  5. See that nova list shows instance without one address, but nova 
interface-list shows all addresses
  6. See that one port is missing in instance xml files
  7. In theory the _heal_instance_info_cache should fix this things, it relies 
on memory, not on the fresh list of instance ports taken from neutron.
  Reproduced Example
  1. Spawn VM with 1 private network port
  nova boot --flavor m1.small --image cirros-0.3.5-x86_64-disk --nic 
net-name=private  test-2
  2. Attach ports to have 2 private and 2 public interfaces
  nova list:
  | a64ed18d-9868-4bf0-90d3-d710d278922d | test-2 | ACTIVE | -          | 
Running     | public=2001:db8::e,, 2001:db8::c,; 
fdda:5d77:e18e:0:f816:3eff:fe53:231c, |
  So we see 4 ports:
  stack@mjozefcz-devstack-ptg:~$ nova interface-list 
  | Port State | Port ID                              | Net ID                  
             | IP addresses                                  | MAC Addr         
  | ACTIVE     | 6c230305-43f8-42ec-9936-61fe67551168 | 
96343d33-5dd2-4289-b0cc-e6c664c2ddd9 |,fdda:5d77:e18e:0:f816:3eff:fee8:3333 | fa:16:3e:e8:33:33 |
  | ACTIVE     | 71e6c6ad-8016-450f-93f2-75e7e014084d | 
9e702a96-2744-40a2-a649-33f935d83ad3 |,2001:db8::c                  
     | fa:16:3e:6d:dc:85 |
  | ACTIVE     | a74c9ee8-c426-48ef-890f-3988ecbe95ff | 
9e702a96-2744-40a2-a649-33f935d83ad3 |,2001:db8::e                  
     | fa:16:3e:cf:0c:e0 |
  | ACTIVE     | b89d6863-fb4c-405c-89f9-698bd9773ad6 | 
96343d33-5dd2-4289-b0cc-e6c664c2ddd9 |,fdda:5d77:e18e:0:f816:3eff:fe53:231c | fa:16:3e:53:23:1c |
  We can also see 4 tap interfaces in xml file:
  stack@mjozefcz-devstack-ptg:~$ sudo virsh dumpxml instance-00000002 | grep -i 
-     <target dev='tap6c230305-43'/>
-     <target dev='tapb89d6863-fb'/>
-     <target dev='tapa74c9ee8-c4'/>
-     <target dev='tap71e6c6ad-80'/>
+     <target dev='tap6c230305-43'/>
+     <target dev='tapb89d6863-fb'/>
+     <target dev='tapa74c9ee8-c4'/>
+     <target dev='tap71e6c6ad-80'/>
  3. Now lets 'corrupt' the instance_info_caches for this specific VM.
  We also noticed some race-condition that cause the same problem, but
  we're unable to reproduce it in devel environment.
  Original one:
  mysql> select * from instance_info_caches where 
  *************************** 1. row ***************************
-  created_at: 2018-02-26 21:25:31
-  updated_at: 2018-02-26 21:29:17
-  deleted_at: NULL
-          id: 2
+  created_at: 2018-02-26 21:25:31
+  updated_at: 2018-02-26 21:29:17
+  deleted_at: NULL
+          id: 2
  network_info: [{"profile": {}, "ovs_interfaceid": 
"6c230305-43f8-42ec-9936-61fe67551168", "preserve_on_delete": false, "network": 
{"bridge": "br-int", "subnets": [{"ips": [{"meta": {}, "version": 6, "type": 
"fixed", "floating_ips": [], "address": 
"fdda:5d77:e18e:0:f816:3eff:fee8:3333"}], "version": 6, "meta": 
{"ipv6_address_mode": "slaac", "dhcp_server": 
"fdda:5d77:e18e:0:f816:3eff:fee7:b04"}, "dns": [], "routes": [], "cidr": 
"fdda:5d77:e18e::/64", "gateway": {"meta": {}, "version": 6, "type": "gateway", 
"address": "fdda:5d77:e18e::1"}}, {"ips": [{"meta": {}, "version": 4, "type": 
"fixed", "floating_ips": [], "address": ""}], "version": 4, "meta": 
{"dhcp_server": ""}, "dns": [], "routes": [], "cidr": "", 
"gateway": {"meta": {}, "version": 4, "type": "gateway", "address": 
""}}], "meta": {"injected": false, "tenant_id": 
"0314943f52014a5b9bc56b73bec475e6", "mtu": 1450}, "id": 
"96343d33-5dd2-4289-b0cc-e6c664c2ddd9", "label": "private"}, "devname": 
"tap6c230305-43", "vnic_type": "normal", "qbh_params": null, "meta": {}, 
"details": {"port_filter": true, "datapath_type": "system", "ovs_hybrid_plug": 
true}, "address": "fa:16:3e:e8:33:33", "active": true, "type": "ovs", "id": 
"6c230305-43f8-42ec-9936-61fe67551168", "qbg_params": null}, {"profile": {}, 
"ovs_interfaceid": "b89d6863-fb4c-405c-89f9-698bd9773ad6", 
"preserve_on_delete": false, "network": {"bridge": "br-int", "subnets": 
[{"ips": [{"meta": {}, "version": 6, "type": "fixed", "floating_ips": [], 
"address": "fdda:5d77:e18e:0:f816:3eff:fe53:231c"}], "version": 6, "meta": 
{"ipv6_address_mode": "slaac", "dhcp_server": 
"fdda:5d77:e18e:0:f816:3eff:fee7:b04"}, "dns": [], "routes": [], "cidr": 
"fdda:5d77:e18e::/64", "gateway": {"meta": {}, "version": 6, "type": "gateway", 
"address": "fdda:5d77:e18e::1"}}, {"ips": [{"meta": {}, "version": 4, "type": 
"fixed", "floating_ips": [], "address": ""}], "version": 4, "meta": 
{"dhcp_server": ""}, "dns": [], "routes": [], "cidr": "", 
"gateway": {"meta": {}, "version": 4, "type": "gateway", "address": 
""}}], "meta": {"injected": false, "tenant_id": 
"0314943f52014a5b9bc56b73bec475e6", "mtu": 1450}, "id": 
"96343d33-5dd2-4289-b0cc-e6c664c2ddd9", "label": "private"}, "devname": 
"tapb89d6863-fb", "vnic_type": "normal", "qbh_params": null, "meta": {}, 
"details": {"port_filter": true, "datapath_type": "system", "ovs_hybrid_plug": 
true}, "address": "fa:16:3e:53:23:1c", "active": true, "type": "ovs", "id": 
"b89d6863-fb4c-405c-89f9-698bd9773ad6", "qbg_params": null}, {"profile": {}, 
"ovs_interfaceid": "a74c9ee8-c426-48ef-890f-3988ecbe95ff", 
"preserve_on_delete": false, "network": {"bridge": "br-int", "subnets": 
[{"ips": [{"meta": {}, "version": 6, "type": "fixed", "floating_ips": [], 
"address": "2001:db8::e"}], "version": 6, "meta": {}, "dns": [], "routes": [], 
"cidr": "2001:db8::/64", "gateway": {"meta": {}, "version": 6, "type": 
"gateway", "address": "2001:db8::2"}}, {"ips": [{"meta": {}, "version": 4, 
"type": "fixed", "floating_ips": [], "address": ""}], "version": 4, 
"meta": {}, "dns": [], "routes": [], "cidr": "", "gateway": 
{"meta": {}, "version": 4, "type": "gateway", "address": ""}}], 
"meta": {"injected": false, "tenant_id": "9c6f74dab29f4c738e82320075fa1f57", 
"mtu": 1500}, "id": "9e702a96-2744-40a2-a649-33f935d83ad3", "label": "public"}, 
"devname": "tapa74c9ee8-c4", "vnic_type": "normal", "qbh_params": null, "meta": 
{}, "details": {"port_filter": true, "datapath_type": "system", 
"ovs_hybrid_plug": true}, "address": "fa:16:3e:cf:0c:e0", "active": true, 
"type": "ovs", "id": "a74c9ee8-c426-48ef-890f-3988ecbe95ff", "qbg_params": 
null}, {"profile": {}, "ovs_interfaceid": 
"71e6c6ad-8016-450f-93f2-75e7e014084d", "preserve_on_delete": false, "network": 
{"bridge": "br-int", "subnets": [{"ips": [{"meta": {}, "version": 6, "type": 
"fixed", "floating_ips": [], "address": "2001:db8::c"}], "version": 6, "meta": 
{}, "dns": [], "routes": [], "cidr": "2001:db8::/64", "gateway": {"meta": {}, 
"version": 6, "type": "gateway", "address": "2001:db8::2"}}, {"ips": [{"meta": 
{}, "version": 4, "type": "fixed", "floating_ips": [], "address": 
""}], "version": 4, "meta": {}, "dns": [], "routes": [], "cidr": 
"", "gateway": {"meta": {}, "version": 4, "type": "gateway", 
"address": ""}}], "meta": {"injected": false, "tenant_id": 
"9c6f74dab29f4c738e82320075fa1f57", "mtu": 1500}, "id": 
"9e702a96-2744-40a2-a649-33f935d83ad3", "label": "public"}, "devname": 
"tap71e6c6ad-80", "vnic_type": "normal", "qbh_params": null, "meta": {}, 
"details": {"port_filter": true, "datapath_type": "system", "ovs_hybrid_plug": 
true}, "address": "fa:16:3e:6d:dc:85", "active": true, "type": "ovs", "id": 
"71e6c6ad-8016-450f-93f2-75e7e014084d", "qbg_params": null}]
  instance_uuid: a64ed18d-9868-4bf0-90d3-d710d278922d
-     deleted: 0
+     deleted: 0
  1 row in set (0.00 sec)
  Modified one (I removed first port from list):
  mysql> select * from instance_info_caches where 
  *************************** 1. row ***************************
-  created_at: 2018-02-26 21:25:31
-  updated_at: 2018-02-26 21:29:17
-  deleted_at: NULL
-          id: 2
+  created_at: 2018-02-26 21:25:31
+  updated_at: 2018-02-26 21:29:17
+  deleted_at: NULL
+          id: 2
  network_info: [{"profile": {}, "ovs_interfaceid": 
"b89d6863-fb4c-405c-89f9-698bd9773ad6", "preserve_on_delete": false, "network": 
{"bridge": "br-int", "subnets": [{"ips": [{"meta": {}, "version": 6, "type": 
"fixed", "floating_ips": [], "address": 
"fdda:5d77:e18e:0:f816:3eff:fe53:231c"}], "version": 6, "meta": 
{"ipv6_address_mode": "slaac", "dhcp_server": 
"fdda:5d77:e18e:0:f816:3eff:fee7:b04"}, "dns": [], "routes": [], "cidr": 
"fdda:5d77:e18e::/64", "gateway": {"meta": {}, "version": 6, "type": "gateway", 
"address": "fdda:5d77:e18e::1"}}, {"ips": [{"meta": {}, "version": 4, "type": 
"fixed", "floating_ips": [], "address": ""}], "version": 4, "meta": 
{"dhcp_server": ""}, "dns": [], "routes": [], "cidr": "", 
"gateway": {"meta": {}, "version": 4, "type": "gateway", "address": 
""}}], "meta": {"injected": false, "tenant_id": 
"0314943f52014a5b9bc56b73bec475e6", "mtu": 1450}, "id": 
"96343d33-5dd2-4289-b0cc-e6c664c2ddd9", "label": "private"}, "devname": 
"tapb89d6863-fb", "vnic_type": "normal", "qbh_params": null, "meta": {}, 
"details": {"port_filter": true, "datapath_type": "system", "ovs_hybrid_plug": 
true}, "address": "fa:16:3e:53:23:1c", "active": true, "type": "ovs", "id": 
"b89d6863-fb4c-405c-89f9-698bd9773ad6", "qbg_params": null}, {"profile": {}, 
"ovs_interfaceid": "a74c9ee8-c426-48ef-890f-3988ecbe95ff", 
"preserve_on_delete": false, "network": {"bridge": "br-int", "subnets": 
[{"ips": [{"meta": {}, "version": 6, "type": "fixed", "floating_ips": [], 
"address": "2001:db8::e"}], "version": 6, "meta": {}, "dns": [], "routes": [], 
"cidr": "2001:db8::/64", "gateway": {"meta": {}, "version": 6, "type": 
"gateway", "address": "2001:db8::2"}}, {"ips": [{"meta": {}, "version": 4, 
"type": "fixed", "floating_ips": [], "address": ""}], "version": 4, 
"meta": {}, "dns": [], "routes": [], "cidr": "", "gateway": 
{"meta": {}, "version": 4, "type": "gateway", "address": ""}}], 
"meta": {"injected": false, "tenant_id": "9c6f74dab29f4c738e82320075fa1f57", 
"mtu": 1500}, "id": "9e702a96-2744-40a2-a649-33f935d83ad3", "label": "public"}, 
"devname": "tapa74c9ee8-c4", "vnic_type": "normal", "qbh_params": null, "meta": 
{}, "details": {"port_filter": true, "datapath_type": "system", 
"ovs_hybrid_plug": true}, "address": "fa:16:3e:cf:0c:e0", "active": true, 
"type": "ovs", "id": "a74c9ee8-c426-48ef-890f-3988ecbe95ff", "qbg_params": 
null}, {"profile": {}, "ovs_interfaceid": 
"71e6c6ad-8016-450f-93f2-75e7e014084d", "preserve_on_delete": false, "network": 
{"bridge": "br-int", "subnets": [{"ips": [{"meta": {}, "version": 6, "type": 
"fixed", "floating_ips": [], "address": "2001:db8::c"}], "version": 6, "meta": 
{}, "dns": [], "routes": [], "cidr": "2001:db8::/64", "gateway": {"meta": {}, 
"version": 6, "type": "gateway", "address": "2001:db8::2"}}, {"ips": [{"meta": 
{}, "version": 4, "type": "fixed", "floating_ips": [], "address": 
""}], "version": 4, "meta": {}, "dns": [], "routes": [], "cidr": 
"", "gateway": {"meta": {}, "version": 4, "type": "gateway", 
"address": ""}}], "meta": {"injected": false, "tenant_id": 
"9c6f74dab29f4c738e82320075fa1f57", "mtu": 1500}, "id": 
"9e702a96-2744-40a2-a649-33f935d83ad3", "label": "public"}, "devname": 
"tap71e6c6ad-80", "vnic_type": "normal", "qbh_params": null, "meta": {}, 
"details": {"port_filter": true, "datapath_type": "system", "ovs_hybrid_plug": 
true}, "address": "fa:16:3e:6d:dc:85", "active": true, "type": "ovs", "id": 
"71e6c6ad-8016-450f-93f2-75e7e014084d", "qbg_params": null}]
  instance_uuid: a64ed18d-9868-4bf0-90d3-d710d278922d
-     deleted: 0
+     deleted: 0
  4. Now lets take a look on `nova list`:
  stack@mjozefcz-devstack-ptg:~$ nova list | grep test-2
  | a64ed18d-9868-4bf0-90d3-d710d278922d | test-2 | ACTIVE | -          | 
Running     | public=2001:db8::e,, 2001:db8::c,; 
private=fdda:5d77:e18e:0:f816:3eff:fe53:231c, |
  So as you see we missed one interface (private).
  Nova interface-list shows it (because it calls neutron instead nova
  stack@mjozefcz-devstack-ptg:~$ nova interface-list 
  | Port State | Port ID                              | Net ID                  
             | IP addresses                                  | MAC Addr         
  | ACTIVE     | 6c230305-43f8-42ec-9936-61fe67551168 | 
96343d33-5dd2-4289-b0cc-e6c664c2ddd9 |,fdda:5d77:e18e:0:f816:3eff:fee8:3333 | fa:16:3e:e8:33:33 |
  | ACTIVE     | 71e6c6ad-8016-450f-93f2-75e7e014084d | 
9e702a96-2744-40a2-a649-33f935d83ad3 |,2001:db8::c                  
     | fa:16:3e:6d:dc:85 |
  | ACTIVE     | a74c9ee8-c426-48ef-890f-3988ecbe95ff | 
9e702a96-2744-40a2-a649-33f935d83ad3 |,2001:db8::e                  
     | fa:16:3e:cf:0c:e0 |
  | ACTIVE     | b89d6863-fb4c-405c-89f9-698bd9773ad6 | 
96343d33-5dd2-4289-b0cc-e6c664c2ddd9 |,fdda:5d77:e18e:0:f816:3eff:fe53:231c | fa:16:3e:53:23:1c |
  5. During this time check the logs - yes, the
  _heal_instance_info_cache has been running for a while but without
  success - stil missing port in instance_info_caches table:
  Feb 26 22:12:03 mjozefcz-devstack-ptg nova-compute[27459]: DEBUG 
oslo_service.periodic_task [None req-ac707da5-3413-412c-b314-ab38db2134bc 
service nova] Running periodic task ComputeManager._heal_instance_info_cache 
{{(pid=27459) run_periodic_tasks 
  Feb 26 22:12:03 mjozefcz-devstack-ptg nova-compute[27459]: DEBUG 
nova.compute.manager [None req-ac707da5-3413-412c-b314-ab38db2134bc service 
nova] Starting heal instance info cache {{(pid=27459) _heal_instance_info_cache 
  Feb 26 22:12:04 mjozefcz-devstack-ptg nova-compute[27459]: DEBUG 
nova.compute.manager [None req-ac707da5-3413-412c-b314-ab38db2134bc service 
nova] [instance: a64ed18d-9868-4bf0-90d3-d710d278922d] Updated the network 
info_cache for instance {{(pid=27459) _heal_instance_info_cache 
  5. Ok, so lets pretend that customer restart the VM.
  stack@mjozefcz-devstack-ptg:~$ nova reboot 
a64ed18d-9868-4bf0-90d3-d710d278922d --hard
  Request to reboot server <Server: test-2> has been accepted.
  6. And now check connected interfaces - WOOPS there is no
  `tap6c230305-43` on the list ;(
  stack@mjozefcz-devstack-ptg:~$ sudo virsh dumpxml instance-00000002  | grep 
-i tap
-     <target dev='tapb89d6863-fb'/>
-     <target dev='tapa74c9ee8-c4'/>
-     <target dev='tap71e6c6ad-80'/>
+     <target dev='tapb89d6863-fb'/>
+     <target dev='tapa74c9ee8-c4'/>
+     <target dev='tap71e6c6ad-80'/>
  Nova master branch, devstack

You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

  [SRU]_heal_instance_info_cache periodic task bases on port list from
  nova db, not from neutron server

To manage notifications about this bug go to:

ubuntu-bugs mailing list

Reply via email to