Here's my rough draft: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Q-in-Q+for+isolated+networks+functional+spec
On Sun, Oct 21, 2012 at 11:41 PM, Chiradeep Vittal <chiradeep.vit...@citrix.com> wrote: > +1 on the FS. > > On 10/20/12 10:52 PM, "Marcus Sorensen" <shadow...@gmail.com> wrote: > >>The admin does have to create a new physical network, the patch just >>allows you to use a tagged network as that physical network rather >>than a real eth device. It is true that cloudstack doesn't know about >>q-in-q per se, but it is the one creating the q-in-q vlans. The admin >>does have to create any "vlan#" devs to be used, but I think that >>makes sense since cloudstack doesn't manage any of your physical >>network devices. Perhaps I need to write a bit of a functional spec >>just to describe it in more detail. >> >>I haven't done anything with it in regards to xen, of course that >>would also be a different patch since it hits different code. If >>someone knows that code well maybe they can help. This is a simple >>patch, but it's made possible by a previous patch that reworks how the >>bridges are named, so enabling it for xen might not be as simple as >>this makes it look. >> >>On Sat, Oct 20, 2012 at 10:57 PM, Chiradeep Vittal >><chiradeep.vit...@citrix.com> wrote: >>> It looks like your patch does not require the admin to configure >>>anything >>> wrt >>> physical networks. The admin knows the list of "outer" VLANs and >>> CloudStack is >>> blissfully unaware of the QinQ stuff. >>> This requires the hypervisors to be independently configured >>>(out-of-band) >>> with the >>> outer VLAN bridges ? >>> It also looks like this is a KVM-only solution. >>> Have you tried this with XS? >>> >>> On 10/18/12 6:21 PM, "Marcus Sorensen" <shadow...@gmail.com> wrote: >>> >>>>Ah, well it's pretty simple, so I'll just paste it here. Again, >>>>perhaps more should be implemented regarding the MTU (like >>>>functionality to configure MTU on the virtual router), but if you know >>>>what to do it can all work via switch configs. >>>> >>>>diff --git >>>>a/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtC >>>>om >>>>putingResource.java >>>>b/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtC >>>>om >>>>putingResource.java >>>>index 1bc70fa..70de3db 100755 >>>>--- >>>>a/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtC >>>>om >>>>putingResource.java >>>>+++ >>>>b/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/LibvirtC >>>>om >>>>putingResource.java >>>>@@ -800,7 +800,7 @@ public class LibvirtComputingResource extends >>>>ServerResourceBase implements >>>> String pif = Script.runSimpleBashScript("brctl show | grep " >>>>+ bridge + " | awk '{print $4}'"); >>>> String vlan = Script.runSimpleBashScript("ls /proc/net/vlan/" + >>>>pif); >>>> >>>>- if (vlan != null && !vlan.isEmpty()) { >>>>+ if (vlan != null && !vlan.isEmpty() && >>>>(!pif.startsWith("vlan") || pif.matches("vlan\\d+\\.\\d+"))) { >>>> pif = Script.runSimpleBashScript("grep ^Device\\: >>>>/proc/net/vlan/" + pif + " | awk {'print $2'}"); >>>> } >>>> >>>>On Thu, Oct 18, 2012 at 8:05 AM, Chip Childers >>>><chip.child...@sungard.com> wrote: >>>>> On Thu, Oct 18, 2012 at 12:42 AM, Marcus Sorensen >>>>><shadow...@gmail.com> >>>>>wrote: >>>>>> Sorry, I've been up to my ears. I've attached the simple patch that >>>>>> makes this all happen, if anyone wants to take a look. This is the >>>>>> code that looks for physical devices. It's passed a bridge and then >>>>>> determines the parent of that bridge, then whether that parent is a >>>>>> tagged device and goes one more step and finds its parent. This just >>>>>> circumvents the last lookup if the parent of the bridge is a "vlan" >>>>>> device (single tagged, e.g. vlan100) but not a double-tagged one >>>>>>(e.g. >>>>>> vlan100.10), and the rest of cloudstack treats vlan100 as though it >>>>>> were a physical device, creates tagged bridges on it if it has guest >>>>>> traffic type, etc. I've been using it in our test bed for about a >>>>>> month, and have only run into the MTU issue. >>>>> >>>>> Hey Marcus, >>>>> >>>>> Attachments get stripped. Can you post it somewhere? >>>>> >>>>>> If people still think it's a good idea, I'll create a functional spec >>>>>> and additional info on how it works. >>>>>> >>>>>> I've also got a small patch to modifyvlans.sh, but I'm debating >>>>>> whether or not it's necessary. It detects whether the "physical >>>>>> interface" is actually a vlan tagged interface, and if so it >>>>>>subtracts >>>>>> the necessary bytes from the MTU when it sets up the double-tagged >>>>>> bridges. It's technically not necessary, as the important part is >>>>>> whether the guest MTUs fit inside the MTU that the switch allows once >>>>>> the extra tag is added. But it just makes it a bit more obvious as to >>>>>> what's needed. However it also breaks the admin's ability to bump the >>>>>> switch MTUs up just a bit, say 1532, to account for the excess >>>>>>without >>>>>> having to go up to 9000 or full jumbo. If anyone is a network guru >>>>>>and >>>>>> has any feedback it would be appreciated, but I'm inclined to leave >>>>>> the MTUs alone and write it into the functional spec that a switch >>>>>> with a 1500 MTU supports double tags up to 1468, and a switch with a >>>>>> 9000 MTU supports VM guest networks up to 8968 MTU. >>>>>> >>>>>> On Mon, Oct 15, 2012 at 1:43 PM, Marcus Sorensen >>>>>><shadow...@gmail.com> >>>>>>wrote: >>>>>>> Ok, I'll pull out the changes and let people see them. Cloudstack >>>>>>> seems to let me put the same vlan ranges on multiple physicals, >>>>>>>though >>>>>>> I haven't done much actual testing with large numbers of vlans. I >>>>>>> imagine there would be other bottlenecks if they all needed to be up >>>>>>> on the same host at once. Luckily we only create bridges for the >>>>>>> actual VMs on the box so it should scale reasonably. >>>>>>> >>>>>>> The only caveat I've run into so far is that you either need to be >>>>>>> running jumbo frames on your switches, or turn down the MTU on the >>>>>>> guests a bit to accommodate the space taken by extra tag. If you >>>>>>> wanted to run jumbo fames on the guests as well, you'd run into the >>>>>>> same situation and have to use slightly less than the 9000 (although >>>>>>> the virtual router would require a patch too for the new size). >>>>>>> >>>>>>> On Mon, Oct 15, 2012 at 9:56 AM, Ahmad Emneina >>>>>>><ahmad.emne...@citrix.com> wrote: >>>>>>>> On 10/15/12 8:35 AM, "Kelceydamage@bbits" <kel...@bbits.ca> wrote: >>>>>>>> >>>>>>>>>That's a far more elegant way then I tried, which was creating >>>>>>>>>tagged >>>>>>>>>interfaces within guests. >>>>>>>>> >>>>>>>>>Sent from my iPhone >>>>>>>>> >>>>>>>>>On Oct 15, 2012, at 12:54 AM, Chiradeep Vittal >>>>>>>>><chiradeep.vit...@citrix.com> wrote: >>>>>>>>> >>>>>>>>>> This sounds like it can be modeled as multiple physical networks? >>>>>>>>>>That >>>>>>>>>>is, >>>>>>>>>> each "outer" vlan (400, 401, etc) is a separate physical network >>>>>>>>>>in the >>>>>>>>>> same zone. That could work, although it is probable that the zone >>>>>>>>>> configuration API bits prevent more than 4k VLANs per zone (that >>>>>>>>>>can be >>>>>>>>>> changed to per physical network). >>>>>>>>>> >>>>>>>>>> As long as communication between guests on different physical >>>>>>>>>>networks >>>>>>>>>> happens via the public network, it should be Ok. >>>>>>>>>> I'd like to see the patch. >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> >>>>>>>>>> On 10/12/12 1:09 AM, "Marcus Sorensen" <shadow...@gmail.com> >>>>>>>>>>wrote: >>>>>>>>>> >>>>>>>>>>> Guys, in looking for a free and scalable way to provide private >>>>>>>>>>>networks >>>>>>>>>>> for customers I've been running a QinQ setup that has been >>>>>>>>>>>working >>>>>>>>>>>quite >>>>>>>>>>> well. I've sort of laid the groundwork for it already in >>>>>>>>>>>changing >>>>>>>>>>>the >>>>>>>>>>> bridge naming conventions about a month ago for KVM(to names >>>>>>>>>>>that >>>>>>>>>>>won't >>>>>>>>>>> collide if the same vlans is used twice on different phys). >>>>>>>>>>> >>>>>>>>>>> Basically the way it works is like this. Linux has two ways of >>>>>>>>>>>creating >>>>>>>>>>> tagged networks, the eth#.# and the less used vlan# network >>>>>>>>>>>devices. I >>>>>>>>>>> have >>>>>>>>>>> a tiny patch that causes cloudstack to treat vlan# devs as >>>>>>>>>>>though >>>>>>>>>>>they >>>>>>>>>>> were >>>>>>>>>>> physical NICs. In this way, you can do something like physical >>>>>>>>>>>devices >>>>>>>>>>> eth0,eth1,and vlan400. management traffic on eth0's bridge, >>>>>>>>>>>storage on >>>>>>>>>>> eth1.102's bridge, maybe eth1.103 for public/guest, then create >>>>>>>>>>>say a >>>>>>>>>>> vlan400 that is tag 400 on eth1. You add a traffic type of guest >>>>>>>>>>>to it >>>>>>>>>>>and >>>>>>>>>>> give it a vlan range, say 10-4000. Then you end up with >>>>>>>>>>>cloudstack >>>>>>>>>>>handing >>>>>>>>>>> out vlan400.10, vlan400.11, etc for guest networks. Works great >>>>>>>>>>>for >>>>>>>>>>> network >>>>>>>>>>> isolation without burning through a bunch of your "real" vlans. >>>>>>>>>>>In the >>>>>>>>>>> unlikely event that you run out, you just create a physical >>>>>>>>>>>vlan401 and >>>>>>>>>>> start over with the vlan numbers. >>>>>>>>>>> >>>>>>>>>>> In theory all-you-can-eat isolated networks without having to >>>>>>>>>>>configure >>>>>>>>>>> hundreds of vlans on your networking equipment. This may require >>>>>>>>>>> additional >>>>>>>>>>> config on any upstream switches to pass the double tags around, >>>>>>>>>>>but in >>>>>>>>>>> general from what I've seen the inner tags just pass through on >>>>>>>>>>>anything >>>>>>>>>>> layer 2, it should only get tricky if you try to tunnel, route >>>>>>>>>>>or >>>>>>>>>>>strip >>>>>>>>>>> tags. >>>>>>>>>>> >>>>>>>>>>> This is especially nice with system VM routers and VPC >>>>>>>>>>>(cloudstack >>>>>>>>>>>takes >>>>>>>>>>> care of everything), but admittedly external routers probably >>>>>>>>>>>will have >>>>>>>>>>> spotty support for being able to route double tagged stuff. I'm >>>>>>>>>>>also a >>>>>>>>>>>bit >>>>>>>>>>> afraid that if I were to get it merged in that it would just >>>>>>>>>>>become >>>>>>>>>>>this >>>>>>>>>>> undocumented hack thing that few know about and nobody uses. So >>>>>>>>>>>I'm >>>>>>>>>>> looking >>>>>>>>>>> for feedback on whether this sounds useful enough to commit, how >>>>>>>>>>>it >>>>>>>>>>>should >>>>>>>>>>> be documented, and whether it makes sense to hint at this in the >>>>>>>>>>>GUI >>>>>>>>>>> somehow. >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> +1 >>>>>>>> >>>>>>>> This actually sounds amazing Marcus. I'd love to see and use this >>>>>>>> implementation. >>>>>>>> >>>>>>>> -- >>>>>>>> Æ >>>>>>>> >>>>>>>> >>>>>>>> >>> >