Hi all,

Wanted to share details of an issue we just discovered around hybrid ml2/ovs 
configuration under Icehouse.

We run ml2 on the API nodes, but the openvswitch plugin/ovs-agent on the 
compute/network nodes.  We ran this split setup because under Havana this was 
the only way we could get ml2 working correctly, and this setup was recommend 
by an ml2 dev.  We kept this design because it continued to work under 
Icehouse, seemingly without issue.  We upgraded from havana to icehouse without 
too much trouble a couple months ago.

However, we had not rebooted any compute nodes since then until this week.  
When the compute nodes came back up, instances that had been created before 
moving to icehouse did not start up because the vif for them was not being 
created.

Exact error is:  
https://gist.githubusercontent.com/krislindgren/c1f4f79dc12403c4815d/raw/386ef0607f32088ad372a27e06e3606f6c1ac220/gistfile1.txt

Turns out this is because ports created under havana were missing the 'hybrid' 
property.  And this was preventing the vif from being recreated on the compute 
host.  The ports for instances created after the icehouse upgrade did have this 
property, and those instances started back up without a problem.

Specifically, the problem is that in the neutron.ml2_port_bindings table, 
instances created before the upgrade had this for vif_details:
{"port_filter": true}
Instances created after the upgrade had this for vif_details:
{"port_filter": true, "ovs_hybrid_plug": true}
Missing this flag caused instances' vifs to never get plug. The cause is this 
method:
https://github.com/openstack/nova/blob/2014.1.2/nova/virt/libvirt/vif.py#L464-L470
Specifically, because the ovs_hybrid_plug flag isn't in the vif_details, 
vif.is_hybrid_plug_enabled() returns False and instead of calling 
plug_ovs_hybrid(), the driver calles plug_ovs_bridge(). plug_ovs_bridge() only 
calls its super implementation, which is a no-op method, so the vif never 
actually gets plugged.

We ended up solving this by manually assigning the hybrid property on the ports 
that were missing it via MySQL (maybe paste the mysql query we used, or at 
least an example.)  Then starting all the havana instances worked normally.
Here's the sql update we used:
update ml2_port_bindings set vif_details = '{"port_filter": true, 
"ovs_hybrid_plug": true}' where vif_details not like '%ovs_hybrid_plug%';
Note: that update statement will overwrite ALL entries that don't contain the 
ovs_hybrid_plug property. This was fine for us, but you should verify that it 
won't munge any of your data.

Not sure if we missed a step in the icehouse upgrade, and/or if this is just a 
function of our particular configuration.  It might be possible that running 
the ml2 pluging with the openvswitch mechansim driver and the ovs-agent is now 
the correct solution because that solution has a hardcoded ovs_hybrid_plug 
=true value.  
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/mech_openvswitch.py#L40

Hope this may be useful info for somebody.

Mike (et. al.)

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Reply via email to