Hi guys,
   I'm working on a bug #1268955 which is due to neutron ovs agent/plugin can't 
process the ports correctly when multiple ports existing for a single VM vif. I 
originally identified two potential solutions but one of them requires not 
minor change; and the other one may result in a race condition. So I'm posting 
it at here to seek help. Please let me know if you have any comments or advice. 
Thanks in advance.

Bug description:

When the guest VM is running under HVM mode, neutron doesn't set the vlan tag 
to the proper port. So guest VM lost network communication.

Problem analysis:
When VM is under HVM mode, ovs will create two ports and two interfaces for a 
single vif inside the VM: If the domID is x, one port/interface is named as 
tapx.0
which is qemu-emulated NIC, used when no PV drivers installed; The other one is 
named as vifx.0 which is the xen network frontend NIC, used when VM has PV 
drivers installed. Depending on the PV driver's existing, either port/interface 
may be used. But current ovs agent/plugin use the VM's vif id(iface-id) to 
identify the port. So depending on the ports sequence retrieved from ovs; only 
one port will be processed by neutron. Then the network problem occurs if the 
finally used port is not the same one processed by neutron (e.g. set vlan tag).



Two of my potential solutions:

1.  configure both ports regardless which port will be used finally; so both 
have the same configuration. It should be able to resolve the problem. But the 
existing code uses the iface-id as the key for each port. Both tapx.0 and 
vifx.0 have the same iface-id. With this solution, I have to change the data 
structure to hold both ports and change relative functions; such required 
change spreads at many places. So it will take much more effort by comparing to 
the 2nd choice. And I have a concern if there will be potential issues to 
configure the inactive port although I can't point it out currently.



2.  if there are multiple choices, ovs set the field of "iface-status" as 
active for the one taking effective; and others will be inactive. So the other 
solution is to return the active one only. If there is any switchover happens 
in later phase, treat this port as updated and then it will configure the new 
chosen port accordingly. In this way it will ensure the active port to be 
configured properly. The needed change is very limited. Please see the draft 
patch set for this solution: https://review.openstack.org/#/c/233498/



But the problem is it will introduce a race condition. E.g. if it sets tag on 
tapx.0 firstly; the guest VM get connection via tapx.0; then the PM driver 
loaded, so the active port switch to vifx.0; but depending on the neutron agent 
polling interval, the vifx.0 may not be tagged for a while; then during this 
period the connection is lost.


Could you share your insights? Thanks a lot.

B.R.
Jianghua Wang
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to