On Jul 19, 2013, at 1:58 PM, Aaron Rosen <aro...@nicira.com> wrote: > > > > > On Fri, Jul 19, 2013 at 8:47 AM, Kyle Mestery (kmestery) <kmest...@cisco.com> > wrote: > On Jul 18, 2013, at 5:16 PM, Aaron Rosen <aro...@nicira.com> wrote: > > > > Hi, > > > > I wanted to raise another design failure of why creating the port on > > nova-compute is bad. Previously, we have encountered this bug > > (https://bugs.launchpad.net/neutron/+bug/1160442). What was causing the > > issue was that when nova-compute calls into quantum to create the port; > > quantum creates the port but fails to return the port to nova and instead > > timesout. When this happens the instance is scheduled to be run on another > > compute node where another port is created with the same device_id and when > > the instance boots it will look like it has two ports. This is still a > > problem that can occur today in our current implementation (!). > > > > I think in order to move forward with this we'll need to compromise. Here > > is my though on how we should proceed. > > > > 1) Modify the quantum API so that mac addresses can now be updated via the > > api. There is no reason why we have this limitation (especially once the > > patch that uses dhcp_release is merged as it will allow us to update the > > lease for the new mac immediately). We need to do this in order for bare > > metal support as we need to match the mac address of the port to the > > compute node. > > > I don't understand how this relates to creating a port through nova-compute. > I'm not saying this is a bad idea, I just don't see how it relates to the > original discussion point on this thread around Yong's patch. > > > 2) move the port-creation from nova-compute to nova-api. This will solve a > > number of issues like the one i pointed out above. > > > This seems like a bad idea. So now a Nova API call will implicitly create a > Neutron port? What happens on failure here? The caller isn't aware the port > was created in Neutron if it's implicit, so who cleans things up? Or if the > caller is aware, than all we've done is move an API the caller would have > done (nova-compute in this case) into nova-api, though the caller is now > still aware of what's happening. > > On failure here the VM will go to ERROR state if the port is failed to create > in quantum. Then when deleting the instance; the delete code should also > search quantum for the device_id in order to remove the port there as well. > So, nova-compute will implicitly know the port was created by nova-api, and if a failure happens, it will clean up the port? That doesn't sound like a balanced solution to me, and seems to tie nova-compute and nova-api close together when it comes to launching VMs with Neutron ports.
> The issue here is that if an instance fails to boot on a compute node > (because nova-compute did not get the port-create response from quantum and > the port was actually created) the instance gets scheduled to be booted on > another nova-compute node where the duplicate create happens. Moving the > creation to the API node removes the port from getting created in the retry > logic that solves this. > I think Ian's comments on your blueprint [1] address this exact problem, can you take a look at them there? [1] https://blueprints.launchpad.net/nova/+spec/nova-api-quantum-create-port > > 3) For now, i'm okay with leaving logic on the compute node that calls > > update-port if the port binding extension is loaded. This will allow the > > vif type to be correctly set as well. > > > And this will also still pass in the hostname the VM was booted on? > > In this case there would have to be an update-port call done on the compute > node which would set the hostname (which is the same case as live migration). > Just to be sure I understand, nova-compute will do this or this will be the responsibility of some neutron agent? Thanks, Kyle > To me, this thread seems to have diverged a bit from the original discussion > point around Yong's patch. Yong's patch makes sense, because it's passing the > hostname the VM is booted on during port create. It also updates the binding > during a live migration, so that case is covered. Any change to this behavior > should cover both those cases and not involve any sort of agent polling, IMHO. > > Thanks, > Kyle > > > Thoughts/Comments? > > > > Thanks, > > > > Aaron > > > > > > On Mon, Jul 15, 2013 at 2:45 PM, Aaron Rosen <aro...@nicira.com> wrote: > > > > > > > > On Mon, Jul 15, 2013 at 1:26 PM, Robert Kukura <rkuk...@redhat.com> wrote: > > On 07/15/2013 03:54 PM, Aaron Rosen wrote: > > > > > > > > > > > > On Sun, Jul 14, 2013 at 6:48 PM, Robert Kukura <rkuk...@redhat.com > > > <mailto:rkuk...@redhat.com>> wrote: > > > > > > On 07/12/2013 04:17 PM, Aaron Rosen wrote: > > > > Hi, > > > > > > > > > > > > On Fri, Jul 12, 2013 at 6:47 AM, Robert Kukura <rkuk...@redhat.com > > > <mailto:rkuk...@redhat.com> > > > > <mailto:rkuk...@redhat.com <mailto:rkuk...@redhat.com>>> wrote: > > > > > > > > On 07/11/2013 04:30 PM, Aaron Rosen wrote: > > > > > Hi, > > > > > > > > > > I think we should revert this patch that was added here > > > > > (https://review.openstack.org/#/c/29767/). What this patch > > > does is > > > > when > > > > > nova-compute calls into quantum to create the port it passes > > > in the > > > > > hostname on which the instance was booted on. The idea of the > > > > patch was > > > > > that providing this information would "allow hardware device > > > vendors > > > > > management stations to allow them to segment the network in > > > a more > > > > > precise manager (for example automatically trunk the vlan on > > > the > > > > > physical switch port connected to the compute node on which > > > the vm > > > > > instance was started)." > > > > > > > > > > In my opinion I don't think this is the right approach. > > > There are > > > > > several other ways to get this information of where a > > > specific port > > > > > lives. For example, in the OVS plugin case the agent running > > > on the > > > > > nova-compute node can update the port in quantum to provide > > > this > > > > > information. Alternatively, quantum could query nova using the > > > > > port.device_id to determine which server the instance is on. > > > > > > > > > > My motivation for removing this code is I now have the free > > > cycles to > > > > > work on > > > > > > > > > > > > > > > https://blueprints.launchpad.net/nova/+spec/nova-api-quantum-create-port > > > > > discussed here > > > > > > > > > > > > > > > (http://lists.openstack.org/pipermail/openstack-dev/2013-May/009088.html) > > > > . > > > > > This was about moving the quantum port creation from the > > > nova-compute > > > > > host to nova-api if a network-uuid is passed in. This will > > > allow us to > > > > > remove all the quantum logic from the nova-compute nodes and > > > > > simplify orchestration. > > > > > > > > > > Thoughts? > > > > > > > > Aaron, > > > > > > > > The ml2-portbinding BP I am currently working on depends on > > > nova setting > > > > the binding:host_id attribute on a port before accessing > > > > binding:vif_type. The ml2 plugin's MechanismDrivers will use the > > > > binding:host_id with the agents_db info to see what (if any) > > > L2 agent is > > > > running on that host, or what other networking mechanisms > > > might provide > > > > connectivity for that host. Based on this, the port's > > > binding:vif_type > > > > will be set to the appropriate type for that agent/mechanism. > > > > > > > > When an L2 agent is involved, the associated ml2 > > > MechanismDriver will > > > > use the agent's interface or bridge mapping info to determine > > > whether > > > > the agent on that host can connect to any of the port's > > > network's > > > > segments, and select the specific segment (network_type, > > > > physical_network, segmentation_id) to be used. If there is no > > > > connectivity possible on the host (due to either no L2 agent > > > or other > > > > applicable mechanism, or no mapping for any of the network's > > > segment's > > > > physical_networks), the ml2 plugin will set the binding:vif_type > > > > attribute to BINDING_FAILED. Nova will then be able to > > > gracefully put > > > > the instance into an error state rather than have the instance > > > boot > > > > without the required connectivity. > > > > > > > > I don't see any problem with nova creating the port before > > > scheduling it > > > > to a specific host, but the binding:host_id needs to be set > > > before the > > > > binding:vif_type attribute is accessed. Note that the host > > > needs to be > > > > determined before the vif_type can be determined, so it is not > > > possible > > > > to rely on the agent discovering the VIF, which can't be > > > created until > > > > the vif_type is determined. > > > > > > > > > > > > So what your saying is the current workflow is this: nova-compute > > > > creates a port in quantum passing in the host-id (which is the > > > hostname > > > > of the compute host). Now quantum looks in the agent table in it's > > > > database to determine the VIF type that should be used based on the > > > > agent that is running on the nova-compute node? > > > > > > Most plugins just return a hard-wired value for binding:vif_type. The > > > ml2 plugin supports heterogeneous deployments, and therefore needs > > > more > > > flexibility, so this is whats being implemented in the agent-based ml2 > > > mechanism drivers. Other mechanism drivers (i.e. controller-based) > > > would > > > work differently. In addition to VIF type selection, port binding in > > > ml2 > > > also involves determining if connectivity is possible, and selecting > > > the > > > network segment to use, and these are also based on binding:host_id. > > > > > > > > > Can you go into more details about what you mean by heterogeneous > > > deployments (i.e what the topology looks like)? Why would connectivity > > > not be possible? I'm confused why things would be configured in such a > > > way where the scheduler wants to launch an instance on a node where > > > quantum is not able to provide connectivity for. > > > > By heterogeneous deployment, I meant that all compute nodes are not > > necessarily identically configured. Some might be running the > > openvswitch agent, some the linuxbridge agent, and some the hyperv > > agent, but all able to access VLANs on (some of) the same trunks. > > > > One example of connectivity not being possible would be if multiple VLAN > > trunks are in use in the datacenter, but not all compute nodes have > > connections to every trunk. > > > > I agree the scheduler should ensure connectivity will be possible. But > > mechanisms such as cells, zones, and flavors can also be used in nova to > > manage heterogeneity. The ml2 port binding code should ideally never > > find out the scheduled node does not have connectivity, but we've at > > least defined what should happen if it does. The main need here though > > is for the port binding code to select the segment to use. > > > > Why does the port binding code select which segment to use? I'm unclear why > > anyone would ever have a deployment with a mix of vlans where things are > > trunked in some places and not in others and neutron would have to keep up > > with that. The part i'm unclear on is how neutron would be expected to > > behave in this type of setup. Say one boots several instances: instance1 > > lands on compute1 and neutron puts it on vlan X. Later instance 2 is booted > > and it lands on compute2 on this node vlan X isn't reachable? > > > > > > > > > > > > > > > > > > > My question would > > > be why > > > > the nova-compute node doesn't already know which VIF_TYPE it should > > > be > > > > using? > > > > > > I guess the thinking was that this knowledge belonged in quantum > > > rather > > > than nova, and thus the GenericVifDriver was introduced in grizzly. > > > See > > > https://blueprints.launchpad.net/nova/+spec/libvirt-vif-driver and > > > https://blueprints.launchpad.net/neutron/+spec/. > > > vif-plugging-improvements > > > > > > <https://blueprints.launchpad.net/neutron/+spec/vif-plugging-improvements>. > > > > > > > > > Thanks for the links. It seems like the the motivation for this was to > > > remove the libvirt vif configuration settings from nova and off load > > > that to quantum via the vif_type param on a port. It seems like when > > > using a specific plugin that plugin will always returns the same > > > vif_type to a given node. This configuration option in my opinion looks > > > best to be handled as part of your deployment automation instead and not > > > baked into quantum ports. > > > > For monolithic plugins, returning a fixed vif_type works, but this is > > not sufficient for ml2. > > > > I was happy with the old approach of configuring drivers in nova (via > > deployment automation ideally), but the decision was made in grizzly to > > switch to the GenericVifDriver. > > > > > > > > My goal is to reduce the orchestration and complexity between nova and > > > quantum. Currently, nova-api and nova-compute both call out to quantum > > > when all of this could be done on the api node (ignoring bare metal for > > > now as in this case we'd need to do something special to handle updating > > > the mac addresses on those logical ports in quantum). > > > > Sounds like the scheduler is going to need to call neutron as well, at > > least in some cases. > > > > Why is this? The only use case I see so far for something other than > > nova-api to call into neutron would be bare metal. I think having neutron > > telling nova which vif type it should be using is really tightly coupling > > nova+quantum integration. I think we should probably reexamine > > https://blueprints.launchpad.net/nova/+spec/libvirt-vif-driver as setting > > the libvirt_type from the neutron side seems to be something that the > > sysadmin should configure once and not have to rely on neutron to specify. > > > > Thanks, > > > > Aaron > > > > -Bob > > > > > > > > > > > -Bob > > > > > > > > > > > > > > > Back when the port binding extension was originally being > > > hashed out, I > > > > had suggested using an explicit bind() operation on port that > > > took the > > > > host_id as a parameter and returned the vif_type as a result. > > > But the > > > > current attribute-based approach was chosen instead. We could > > > consider > > > > adding a bind() operation for the next neutron API revision, > > > but I don't > > > > see any reason the current attribute-based binding approach > > > cannot work > > > > for now. > > > > > > > > -Bob > > > > > > > > > > > > > > Best, > > > > > > > > > > Aaron > > > > > > > > > > > > > > > _______________________________________________ > > > > > OpenStack-dev mailing list > > > > > OpenStack-dev@lists.openstack.org > > > <mailto:OpenStack-dev@lists.openstack.org> > > > > <mailto:OpenStack-dev@lists.openstack.org > > > <mailto:OpenStack-dev@lists.openstack.org>> > > > > > > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > OpenStack-dev mailing list > > > > OpenStack-dev@lists.openstack.org > > > <mailto:OpenStack-dev@lists.openstack.org> > > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > > > > > > > > > > > > > > > _______________________________________________ > > OpenStack-dev mailing list > > OpenStack-dev@lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev