On Fri, Jul 19, 2013 at 8:47 AM, Kyle Mestery (kmestery) <kmest...@cisco.com > wrote:
> On Jul 18, 2013, at 5:16 PM, Aaron Rosen <aro...@nicira.com> wrote: > > > > Hi, > > > > I wanted to raise another design failure of why creating the port on > nova-compute is bad. Previously, we have encountered this bug ( > https://bugs.launchpad.net/neutron/+bug/1160442). What was causing the > issue was that when nova-compute calls into quantum to create the port; > quantum creates the port but fails to return the port to nova and instead > timesout. When this happens the instance is scheduled to be run on another > compute node where another port is created with the same device_id and when > the instance boots it will look like it has two ports. This is still a > problem that can occur today in our current implementation (!). > > > > I think in order to move forward with this we'll need to compromise. > Here is my though on how we should proceed. > > > > 1) Modify the quantum API so that mac addresses can now be updated via > the api. There is no reason why we have this limitation (especially once > the patch that uses dhcp_release is merged as it will allow us to update > the lease for the new mac immediately). We need to do this in order for > bare metal support as we need to match the mac address of the port to the > compute node. > > > I don't understand how this relates to creating a port through > nova-compute. I'm not saying this is a bad idea, I just don't see how it > relates to the original discussion point on this thread around Yong's patch. > > > 2) move the port-creation from nova-compute to nova-api. This will solve > a number of issues like the one i pointed out above. > > > This seems like a bad idea. So now a Nova API call will implicitly create > a Neutron port? What happens on failure here? The caller isn't aware the > port was created in Neutron if it's implicit, so who cleans things up? Or > if the caller is aware, than all we've done is move an API the caller would > have done (nova-compute in this case) into nova-api, though the caller is > now still aware of what's happening. > On failure here the VM will go to ERROR state if the port is failed to create in quantum. Then when deleting the instance; the delete code should also search quantum for the device_id in order to remove the port there as well. The issue here is that if an instance fails to boot on a compute node (because nova-compute did not get the port-create response from quantum and the port was actually created) the instance gets scheduled to be booted on another nova-compute node where the duplicate create happens. Moving the creation to the API node removes the port from getting created in the retry logic that solves this. > > > 3) For now, i'm okay with leaving logic on the compute node that calls > update-port if the port binding extension is loaded. This will allow the > vif type to be correctly set as well. > > > And this will also still pass in the hostname the VM was booted on? > > In this case there would have to be an update-port call done on the compute node which would set the hostname (which is the same case as live migration). > To me, this thread seems to have diverged a bit from the original > discussion point around Yong's patch. Yong's patch makes sense, because > it's passing the hostname the VM is booted on during port create. It also > updates the binding during a live migration, so that case is covered. Any > change to this behavior should cover both those cases and not involve any > sort of agent polling, IMHO. > > Thanks, > Kyle > > > Thoughts/Comments? > > > > Thanks, > > > > Aaron > > > > > > On Mon, Jul 15, 2013 at 2:45 PM, Aaron Rosen <aro...@nicira.com> wrote: > > > > > > > > On Mon, Jul 15, 2013 at 1:26 PM, Robert Kukura <rkuk...@redhat.com> > wrote: > > On 07/15/2013 03:54 PM, Aaron Rosen wrote: > > > > > > > > > > > > On Sun, Jul 14, 2013 at 6:48 PM, Robert Kukura <rkuk...@redhat.com > > > <mailto:rkuk...@redhat.com>> wrote: > > > > > > On 07/12/2013 04:17 PM, Aaron Rosen wrote: > > > > Hi, > > > > > > > > > > > > On Fri, Jul 12, 2013 at 6:47 AM, Robert Kukura < > rkuk...@redhat.com > > > <mailto:rkuk...@redhat.com> > > > > <mailto:rkuk...@redhat.com <mailto:rkuk...@redhat.com>>> wrote: > > > > > > > > On 07/11/2013 04:30 PM, Aaron Rosen wrote: > > > > > Hi, > > > > > > > > > > I think we should revert this patch that was added here > > > > > (https://review.openstack.org/#/c/29767/). What this patch > > > does is > > > > when > > > > > nova-compute calls into quantum to create the port it > passes > > > in the > > > > > hostname on which the instance was booted on. The idea of > the > > > > patch was > > > > > that providing this information would "allow hardware > device > > > vendors > > > > > management stations to allow them to segment the network in > > > a more > > > > > precise manager (for example automatically trunk the vlan > on the > > > > > physical switch port connected to the compute node on which > > > the vm > > > > > instance was started)." > > > > > > > > > > In my opinion I don't think this is the right approach. > > > There are > > > > > several other ways to get this information of where a > > > specific port > > > > > lives. For example, in the OVS plugin case the agent > running > > > on the > > > > > nova-compute node can update the port in quantum to > provide this > > > > > information. Alternatively, quantum could query nova using > the > > > > > port.device_id to determine which server the instance is > on. > > > > > > > > > > My motivation for removing this code is I now have the free > > > cycles to > > > > > work on > > > > > > > > > > > > > https://blueprints.launchpad.net/nova/+spec/nova-api-quantum-create-port > > > > > discussed here > > > > > > > > > > > > ( > http://lists.openstack.org/pipermail/openstack-dev/2013-May/009088.html) > > > > . > > > > > This was about moving the quantum port creation from the > > > nova-compute > > > > > host to nova-api if a network-uuid is passed in. This will > > > allow us to > > > > > remove all the quantum logic from the nova-compute nodes > and > > > > > simplify orchestration. > > > > > > > > > > Thoughts? > > > > > > > > Aaron, > > > > > > > > The ml2-portbinding BP I am currently working on depends on > > > nova setting > > > > the binding:host_id attribute on a port before accessing > > > > binding:vif_type. The ml2 plugin's MechanismDrivers will use > the > > > > binding:host_id with the agents_db info to see what (if any) > > > L2 agent is > > > > running on that host, or what other networking mechanisms > > > might provide > > > > connectivity for that host. Based on this, the port's > > > binding:vif_type > > > > will be set to the appropriate type for that agent/mechanism. > > > > > > > > When an L2 agent is involved, the associated ml2 > > > MechanismDriver will > > > > use the agent's interface or bridge mapping info to determine > > > whether > > > > the agent on that host can connect to any of the port's > network's > > > > segments, and select the specific segment (network_type, > > > > physical_network, segmentation_id) to be used. If there is no > > > > connectivity possible on the host (due to either no L2 agent > > > or other > > > > applicable mechanism, or no mapping for any of the network's > > > segment's > > > > physical_networks), the ml2 plugin will set the > binding:vif_type > > > > attribute to BINDING_FAILED. Nova will then be able to > > > gracefully put > > > > the instance into an error state rather than have the > instance > > > boot > > > > without the required connectivity. > > > > > > > > I don't see any problem with nova creating the port before > > > scheduling it > > > > to a specific host, but the binding:host_id needs to be set > > > before the > > > > binding:vif_type attribute is accessed. Note that the host > > > needs to be > > > > determined before the vif_type can be determined, so it is > not > > > possible > > > > to rely on the agent discovering the VIF, which can't be > > > created until > > > > the vif_type is determined. > > > > > > > > > > > > So what your saying is the current workflow is this: nova-compute > > > > creates a port in quantum passing in the host-id (which is the > > > hostname > > > > of the compute host). Now quantum looks in the agent table in > it's > > > > database to determine the VIF type that should be used based on > the > > > > agent that is running on the nova-compute node? > > > > > > Most plugins just return a hard-wired value for binding:vif_type. > The > > > ml2 plugin supports heterogeneous deployments, and therefore needs > more > > > flexibility, so this is whats being implemented in the agent-based > ml2 > > > mechanism drivers. Other mechanism drivers (i.e. controller-based) > would > > > work differently. In addition to VIF type selection, port binding > in ml2 > > > also involves determining if connectivity is possible, and > selecting the > > > network segment to use, and these are also based on > binding:host_id. > > > > > > > > > Can you go into more details about what you mean by heterogeneous > > > deployments (i.e what the topology looks like)? Why would connectivity > > > not be possible? I'm confused why things would be configured in such a > > > way where the scheduler wants to launch an instance on a node where > > > quantum is not able to provide connectivity for. > > > > By heterogeneous deployment, I meant that all compute nodes are not > > necessarily identically configured. Some might be running the > > openvswitch agent, some the linuxbridge agent, and some the hyperv > > agent, but all able to access VLANs on (some of) the same trunks. > > > > One example of connectivity not being possible would be if multiple VLAN > > trunks are in use in the datacenter, but not all compute nodes have > > connections to every trunk. > > > > I agree the scheduler should ensure connectivity will be possible. But > > mechanisms such as cells, zones, and flavors can also be used in nova to > > manage heterogeneity. The ml2 port binding code should ideally never > > find out the scheduled node does not have connectivity, but we've at > > least defined what should happen if it does. The main need here though > > is for the port binding code to select the segment to use. > > > > Why does the port binding code select which segment to use? I'm unclear > why anyone would ever have a deployment with a mix of vlans where things > are trunked in some places and not in others and neutron would have to keep > up with that. The part i'm unclear on is how neutron would be expected to > behave in this type of setup. Say one boots several instances: instance1 > lands on compute1 and neutron puts it on vlan X. Later instance 2 is booted > and it lands on compute2 on this node vlan X isn't reachable? > > > > > > > > > > > > > > > > > > > My question > would > > > be why > > > > the nova-compute node doesn't already know which VIF_TYPE it > should be > > > > using? > > > > > > I guess the thinking was that this knowledge belonged in quantum > rather > > > than nova, and thus the GenericVifDriver was introduced in > grizzly. See > > > https://blueprints.launchpad.net/nova/+spec/libvirt-vif-driver and > > > https://blueprints.launchpad.net/neutron/+spec/. > > > vif-plugging-improvements > > > < > https://blueprints.launchpad.net/neutron/+spec/vif-plugging-improvements>. > > > > > > > > > Thanks for the links. It seems like the the motivation for this was to > > > remove the libvirt vif configuration settings from nova and off load > > > that to quantum via the vif_type param on a port. It seems like when > > > using a specific plugin that plugin will always returns the same > > > vif_type to a given node. This configuration option in my opinion looks > > > best to be handled as part of your deployment automation instead and > not > > > baked into quantum ports. > > > > For monolithic plugins, returning a fixed vif_type works, but this is > > not sufficient for ml2. > > > > I was happy with the old approach of configuring drivers in nova (via > > deployment automation ideally), but the decision was made in grizzly to > > switch to the GenericVifDriver. > > > > > > > > My goal is to reduce the orchestration and complexity between nova and > > > quantum. Currently, nova-api and nova-compute both call out to quantum > > > when all of this could be done on the api node (ignoring bare metal for > > > now as in this case we'd need to do something special to handle > updating > > > the mac addresses on those logical ports in quantum). > > > > Sounds like the scheduler is going to need to call neutron as well, at > > least in some cases. > > > > Why is this? The only use case I see so far for something other than > nova-api to call into neutron would be bare metal. I think having neutron > telling nova which vif type it should be using is really tightly coupling > nova+quantum integration. I think we should probably reexamine > https://blueprints.launchpad.net/nova/+spec/libvirt-vif-driver as setting > the libvirt_type from the neutron side seems to be something that the > sysadmin should configure once and not have to rely on neutron to specify. > > > > Thanks, > > > > Aaron > > > > -Bob > > > > > > > > > > > -Bob > > > > > > > > > > > > > > > Back when the port binding extension was originally being > > > hashed out, I > > > > had suggested using an explicit bind() operation on port that > > > took the > > > > host_id as a parameter and returned the vif_type as a result. > > > But the > > > > current attribute-based approach was chosen instead. We could > > > consider > > > > adding a bind() operation for the next neutron API revision, > > > but I don't > > > > see any reason the current attribute-based binding approach > > > cannot work > > > > for now. > > > > > > > > -Bob > > > > > > > > > > > > > > Best, > > > > > > > > > > Aaron > > > > > > > > > > > > > > > _______________________________________________ > > > > > OpenStack-dev mailing list > > > > > OpenStack-dev@lists.openstack.org > > > <mailto:OpenStack-dev@lists.openstack.org> > > > > <mailto:OpenStack-dev@lists.openstack.org > > > <mailto:OpenStack-dev@lists.openstack.org>> > > > > > > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > OpenStack-dev mailing list > > > > OpenStack-dev@lists.openstack.org > > > <mailto:OpenStack-dev@lists.openstack.org> > > > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > > > > > > > > > > > > > > > _______________________________________________ > > OpenStack-dev mailing list > > OpenStack-dev@lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev