I've had a terrible time getting the community to help me with this problem. So special thanks to Darragh O'Reilly and rkeene on #openstack who was mean and a bit of a wisenheimer (I'd use different words elsewhere), but at least he talked to me and got me to think twice about my GRE setup.
But enough of that, problem solved and a bug report has been submitted: https://bugs.launchpad.net/quantum/+bug/1179223.. I added an "s" to the front of "persists" in the subject, but whatever. I always leave one thing in the hotel room, and I always leave one embarrassing typo. Here's the part explaining how it was fixed: SOLUTION: mysql> delete from ovs_tunnel_endpoints where id = 1; Query OK, 1 row affected (0.00 sec) mysql> select * from ovs_tunnel_endpoints; +-----------------+----+ | ip_address | id | +-----------------+----+ | 192.168.239.110 | 3 | | 192.168.239.114 | 4 | | 192.168.239.115 | 5 | | 192.168.239.99 | 2 | +-----------------+----+ 4 rows in set (0.00 sec) * After doing that, I simply restarted the quantum ovs agents on the network and compute nodes. The old GRE tunnel is not re-created. Thereafter, VM network traffic to and from the external network proceeds without incident. * Should these tables be cleaned up as well, I wonder: mysql> select * from ovs_network_bindings; +--------------------------------------+--------------+------------------+-----------------+ | network_id | network_type | physical_network | segmentation_id | +--------------------------------------+--------------+------------------+-----------------+ | 4e8aacca-8b38-40ac-a628-18cac3168fe6 | gre | NULL | 2 | | af224f3f-8de6-4e0d-b043-6bcd5cb014c5 | gre | NULL | 1 | +--------------------------------------+--------------+------------------+-----------------+ 2 rows in set (0.00 sec) mysql> select * from ovs_tunnel_allocations where allocated != 0; +-----------+-----------+ | tunnel_id | allocated | +-----------+-----------+ | 1 | 1 | | 2 | 1 | +-----------+-----------+ 2 rows in set (0.00 sec) Cheers, and happy openstacking. Even you, rkeene! --Greg Chavez On Sat, May 11, 2013 at 2:28 PM, Greg Chavez <greg.cha...@gmail.com> wrote: > So to be clear: > > * I have a three nics on my network node. The VM traffic goes out the > 1st nic on 192.168.239.99/24 to the other compute nodes, while > management traffic goes out the 2nd nic on 192.168.241.99. The 3rd nic > is external and has no IP. > > * I have four GRE endpoints on the VM network, one at the network node > (192.168.239.99) and three on compute nodes > (192.168.239.{110,114,115}), all with IDs 2-5. > > * I have a fifth GRE endpoint with id 1 to 192.168.241.99, the network > node's management interface. This was the first tunnel created when I > deployed the network node because that is how I set the remote_ip in > the ovs plugin ini. I corrected the setting later, but the > 192.168.241.99 endpoint persists and, as your response implies, *this > extraneous endpoint is the cause of my troubles*. > > My next question then is what is happening? My guess: > > * I ping a guest from the external network using its floater (10.21.166.4). > > * It gets NAT'd at the tenant router on the network node to > 192.168.252.3, at which point an arp request is sent over the unified > GRE broadcast domain. > > * On a compute node, the arp request is received by the VM, which then > sends a reply to the tenant router's MAC (which I verified with > tcpdumps). > > * There are four endpoints for the packet to go down: > > Bridge br-tun > Port br-tun > Interface br-tun > type: internal > Port "gre-1" > Interface "gre-1" > type: gre > options: {in_key=flow, out_key=flow, > remote_ip="192.168.241.99"} > Port "gre-4" > Interface "gre-4" > type: gre > options: {in_key=flow, out_key=flow, > remote_ip="192.168.239.114"} > Port "gre-3" > Interface "gre-3" > type: gre > options: {in_key=flow, out_key=flow, > remote_ip="192.168.239.110"} > Port patch-int > Interface patch-int > type: patch > options: {peer=patch-tun} > Port "gre-2" > Interface "gre-2" > type: gre > options: {in_key=flow, out_key=flow, > remote_ip="192.168.239.99"} > > Here's where I get confused. Does it know that gre-1 is a different > broadcast domain than the others, or does is see all endpoints as the > same domain? > > What happens here? Is this the cause of my network timeouts on > external connections to the VMs? Does this also explain the sporadic > nature of the timeouts, why they aren't consistent in frequency or > duration? > > Finally, what happens when I remove the oddball endpoint from the DB? > Sounds risky! > > Thanks for your help > --Greg Chavez > > On Fri, May 10, 2013 at 7:17 PM, Darragh O'Reilly > <dara2002-openst...@yahoo.com> wrote: >> I'm not sure how to rectify that. You may have to delete the bad row from >> the DB and restart the agents: >> >> mysql> use quantum; >> mysql> select * from ovs_tunnel_endpoints; >> ... >> >>On Fri, May 10, 2013 at 6:43 PM, Greg Chavez <greg.cha...@gmail.com> wrote: >>> I'm refactoring my question once again (see "A Grizzly arping >>> failure" and "Failure to arp by quantum router"). >>> >>> Quickly, the problem is in a multi-node Grizzly+Raring setup with a >>> separate network node and a dedicated VLAN for VM traffic. External >>> connections time out within a minute and dont' resume until traffic is >>> initiated from the VM. >>> >>> I got some rather annoying and hostile assistance just now on IRC and >>> while it didn't result in a fix, it got me to realize that the problem >>> is possibly with my GRE setup. >>> >>> I made a mistake when I originally set this up, assigning the mgmt >>> interface of the network node (192.168.241.99) as its GRE remote_ip >>> instead if the vm_config network interface (192.168.239.99). I >>> realized my mistake and reconfigured the OVS plugin on the network >>> node and moved one. But now, taking a look at my OVS bridges on the >>> network node, I see that the old remote IP is still there! >>> >>> Bridge br-tun >>> <snip> >>> Port "gre-1" >>> Interface "gre-1" >>> type: gre >>> options: {in_key=flow, out_key=flow, >>> remote_ip="192.168.241.99"} >>> <snip> >>> >>> This is also on all the compute nodes. >>> >>> ( Full ovs-vsctl show output here: http://pastebin.com/xbre1fNV) >>> >>> What's more, I have this error every time I restart OVS: >>> >>> 2013-05-10 18:21:24 ERROR [quantum.agent.linux.ovs_lib] Unable to >>> execute ['ovs-vsctl', '--timeout=2', 'add-port', 'br-tun', 'gre-5']. >>> Exception: >>> Command: ['sudo', 'quantum-rootwrap', '/etc/quantum/rootwrap.conf', >>> 'ovs-vsctl', '--timeout=2', 'add-port', 'br-tun', 'gre-5'] >>> Exit code: 1 >>> Stdout: '' >>> Stderr: 'ovs-vsctl: cannot create a port named gre-5 because a port >>> named gre-5 already exists on bridge br-tun\n' >>> >>> Could that be because grep-1 is vestigial and possibly fouling up the >>> works by creating two possible paths for VM traffic? >>> >>> Is it as simple as removing it with ovs-vsctl or is something else >>> required? >>> >>> Or is this actually needed for some reason? Argh... help! _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp