No you are right. VMware has done a lot of work through things like VAAI and Storage DRS etc to mitigate and in some cases eliminate many of the issues. However, not everyone runs VMware. (while I must admit, being at VMworld this week, I wouldn't tell any of these folks that ;-)
On Tue, Aug 28, 2012 at 10:38 AM, Ivan Pepelnjak <[email protected]>wrote: > Storage I/O control (SIOC) is supposed to work on datastore level across > multiple hypervisors (previous disclaimer still applies :) ) > > > On 8/28/12 7:32 PM, Stiliadis, Dimitrios (Dimitri) wrote: > >> ;) see my "multiplexed" comment below (i.e. two servers mounting the same >> file system). >> >> IO shares work on the server side .. >> >> Cheers, >> >> Dimitri >> >> On 8/28/12 10:25 AM, "Ivan >> Pepelnjak"<ipepelnjak@gmail.**com<[email protected]>> >> wrote: >> >> If I understood vSphere manuals and discussions on various blogs/forums >>> correctly, VMware solved most of this problem a long time ago with I/O >>> shares and a few other features ... but don't trust a networking guy to >>> know anything about storage :) >>> >>> On 8/28/12 7:10 PM, Jon Hudson wrote: >>> >>>> Dead on. >>>> >>>> Anytime you have a fan-in-fan-out type traffic flow with filesystem >>>> info one person can ruin the party for everyone. FCoE is a perfect >>>> example where a pause frame sent on a aggregation link can end up >>>> impacting many initiators. Or even at a controller level of any array >>>> you can get a traffic jam of sorts on poorly designed and layed out >>>> subsystems. Or too few lines for food at an IETF social. >>>> >>>> Lots can be done with queues etc to mitigate the issue, but it is >>>> always something to be mindful of. Especially if your remote filesystem >>>> is not just a mounted LUN but the mainsystem/boot LUN and you have >>>> windows paging over the wire. >>>> >>>> On Aug 28, 2012, at 9:55 AM, "Stiliadis, Dimitrios >>>> (Dimitri)"<dimitri.stiliadis@**alcatel-lucent.com<[email protected]>> >>>> wrote: >>>> >>>> FCoE is clearly not a requirementŠ >>>>> >>>>> but, there is something to be said about storage (and I should have >>>>> responded in the >>>>> other email about this), but in general storage isolation is done at >>>>> the >>>>> storage >>>>> level and not the network layer. So, we can ignore. >>>>> >>>>> If we take a a storage server that exports a file system that is >>>>> mounted >>>>> by a >>>>> hypervisor and multiple tenants have their VMs in this file system, >>>>> then a >>>>> single >>>>> network connection between hypervisor and storage device could >>>>> potentially >>>>> lead to head of line blocking and allow one tenant to influence the >>>>> performance >>>>> of another tenant. If my memory serves me correct, VMware for example >>>>> can >>>>> only >>>>> use two or four iSCSI initiators that have be to shared by the >>>>> different >>>>> VMs >>>>> of the hypervisor, and thus traffic from multiple tenants is >>>>> multiplexed >>>>> on the same >>>>> network flow .. This means that storage drivers/devices have to take >>>>> care >>>>> of >>>>> traffic isolation. And this can be perfectly fine in point-to-point >>>>> situations, but >>>>> it can get interesting in multiplexed scenarios Š >>>>> >>>>> (but we just don't want the storage guys to blame the network guys for >>>>> performance issues ;) >>>>> >>>>> Dimitri >>>>> >>>>> On 8/28/12 9:44 AM, "Ivan >>>>> Pepelnjak"<ipepelnjak@gmail.**com<[email protected]>> >>>>> wrote: >>>>> >>>>> >>>>>> In sane real-life designs the virtual network overlay solution would >>>>>> not >>>>>> transport FCoE. I'm also positive someone will come up with exactly >>>>>> that >>>>>> requirement sooner rather than later :D >>>>>> >>>>>> On 8/28/12 6:40 PM, Aldrin Isaac wrote: >>>>>> >>>>>> The question regarding FCoE is whether overlay solutions need to >>>>>> transport it. I think the answer is no. If something operates at the >>>>>> underlay level than it isn't in scope for NVo3, including DCB. >>>>>> >>>>>> On Tuesday, August 28, 2012, Somesh Gupta wrote: >>>>>> >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>>> From: >>>>>>> >>>>>> [email protected]<**javascript:;> [mailto:[email protected] >>>>>> <javascript:;>] On Behalf Of >>>>>> >>>>>>> Ivan Pepelnjak >>>>>>> Sent: Tuesday, August 28, 2012 12:22 AM >>>>>>> To: Stiliadis, Dimitrios (Dimitri) >>>>>>> Cc: Black, David; >>>>>>> >>>>>> [email protected]<javascript:;>; Linda Dunbar >>>>>> >>>>>>> Subject: Re: [nvo3] Let's refocus on real world (was: Comments on >>>>>>> Live >>>>>>> Migration and VLAN-IDs) >>>>>>> >>>>>>> Dimitri, >>>>>>> >>>>>>> We're more in agreement than it might seem. I might have my doubts >>>>>>> about >>>>>>> the operational viability of the OpenStack-to-baremetal use case you >>>>>>> described below, but I'm positive someone will try to do that as >>>>>>> well. >>>>>>> >>>>>>> In any case, regardless of whether we're considering VMs or >>>>>>> bare-metal >>>>>>> servers, in the simplest scenario the server-to-NVE connection is a >>>>>>> point-to-point link, usually without VLAN tagging. >>>>>>> >>>>>>> In the VM/hypervisor case, NVE is implemented in the hypervisor soft >>>>>>> switch; in the baremetal server case, it has to be implemented in the >>>>>>> ToR switch. >>>>>>> >>>>>> This is certainly only today's restriction. If nov3 takes off, there >>>>>> certainly could be a pseudo-driver in Linux that could implement the >>>>>> NVE (like a VLAN driver) without much additional overhead. >>>>>> >>>>>> It's important to keep in mind the limitations of the ToR switches to >>>>>>> ensure whatever solution we agree upon will be implementable in ToR >>>>>>> switches as well, but it makes absolutely no sense to assume NVE will >>>>>>> not be in the hypervisor (because someone wants to support a customer >>>>>>> having a decade-old VLAN-only hypervisor soft switch). >>>>>>> >>>>>>> As for ToR switch capabilities, Dell has demonstrated NVGRE support >>>>>>> and >>>>>>> Arista is right now showing off a hardware VXLAN VTEP prototype, so I >>>>>>> guess it's safe to assume next-generation merchant silicon will >>>>>>> support >>>>>>> GRE- and UDP-based encapsulations well before we'll agree on what >>>>>>> NVO3 >>>>>>> solution should be. >>>>>>> >>>>>>> Finally, can at least some of us agree that the topology that makes >>>>>>> most >>>>>>> sense is a direct P2P link between (VM or bare-metal) server and NVE >>>>>>> using VLAN tagging only when a server participating in multiple L2 >>>>>>> CUGs >>>>>>> has interface limitations? >>>>>>> >>>>>>> Kind regards, >>>>>>> Ivan >>>>>>> >>>>>>> On 8/27/12 6:55 AM, Stiliadis, Dimitrios (Dimitri) wrote: >>>>>>> >>>>>>>> Ivan: >>>>>>>> >>>>>>>> I agree and at the same time disagree with some of the statements >>>>>>>> below. I would like to understand your view. >>>>>>>> >>>>>>>> See inline: >>>>>>>> >>>>>>>> On 8/25/12 8:22 AM, "Ivan >>>>>>>> Pepelnjak"<ipepelnjak@gmail.**com<[email protected]>> >>>>>>>> wrote: >>>>>>>> >>>>>>>> On 8/24/12 11:11 PM, Linda Dunbar wrote: >>>>>>>>> [...] >>>>>>>>> >>>>>>>>> But most, if not all, data centers today don't have the >>>>>>>>>> Hypervisors >>>>>>>>>> which can encapsulate the NVo3 defined header. The deployment to >>>>>>>>>> >>>>>>>>> all >>>>>>> >>>>>>>> 100% NVo3 header based servers won't happen overnight. One thing >>>>>>>>>> >>>>>>>>> for >>>>>>> >>>>>>>> sure that you will see data centers with mixed types of servers >>>>>>>>>> for >>>>>>>>>> very long time. >>>>>>>>>> >>>>>>>>>> If NVEs are in the ToR, you will see mixed scenario of blade >>>>>>>>>> >>>>>>>>> servers, >>>>>>> >>>>>>>> servers with simple virtual switches, or even IEEE802.1Qbg's VEPA. >>>>>>>>>> >>>>>>>>> So >>>>>>> >>>>>>>> it is necessary for NVo3 to deal with the "L2 Site" defined in >>>>>>>>>> this >>>>>>>>>> draft. >>>>>>>>>> >>>>>>>>> There are two hypothetical ways of implementing NVO3: existing >>>>>>>>> >>>>>>>> layer-2 >>>>>>> >>>>>>>> technologies (with well-known scaling properties that prompted the >>>>>>>>> creation of NVO3 working group) or something-over-IP encapsulation. >>>>>>>>> >>>>>>>>> I might be myopic, but from what I see most data centers today (at >>>>>>>>> >>>>>>>> least >>>>>>> >>>>>>>> based on market shares of individual vendors) don't have ToR >>>>>>>>> >>>>>>>> switches >>>>>>> >>>>>>>> that would be able to encapsulate MAC frames or IP datagrams in >>>>>>>>> UDP, >>>>>>>>> >>>>>>>> GRE >>>>>>> >>>>>>>> or MPLS envelopes. I am not familiar enough with the commonly used >>>>>>>>> merchant silicon hardware to understand whether that's a software >>>>>>>>> or >>>>>>>>> hardware limitation. In any case, I wouldn't expect switch vendors >>>>>>>>> >>>>>>>> to >>>>>>> >>>>>>>> roll out NVO3-like something-over-IP solutions any time soon. >>>>>>>>> >>>>>>>>> On the hypervisor front, VXLAN is shipping for months, NVGRE is >>>>>>>>> >>>>>>>> included >>>>>>> >>>>>>>> in the next version of Hyper-V and MAC-over-GRE is available (with >>>>>>>>> >>>>>>>> Open >>>>>>> >>>>>>>> vSwitch) for both KVM and Xen. Open vSwitch is also part of >>>>>>>>> standard >>>>>>>>> Linux kernel distribution and thus available to any other Linux- >>>>>>>>> >>>>>>>> based >>>>>>> >>>>>>>> hypervisor product. >>>>>>>>> >>>>>>>>> So: all major hypervisors have MAC-over-IP solutions, each one >>>>>>>>> using >>>>>>>>> >>>>>>>> a >>>>>>> >>>>>>>> proprietary encapsulation because there's no standard way of doing >>>>>>>>> >>>>>>>> it, >>>>>>> >>>>>>>> and yet we're spending time discussing and documenting the history >>>>>>>>> >>>>>>>> of >>>>>>> >>>>>>>> evolution of virtual networking. Maybe we should be a bit more >>>>>>>>> forward-looking, acknowledge the world has changed, and come up >>>>>>>>> with >>>>>>>>> >>>>>>>> a >>>>>>> >>>>>>>> relevant hypervisor-based solution. >>>>>>>>> >>>>>>>> Correct, and here is where IETF as a standard body fails. There is >>>>>>>> no >>>>>>>> easy way (any time soon) for a VXLAN based solution to talk to an >>>>>>>> >>>>>>> NVGRE >>>>>>> >>>>>>>> or MAC/GRE, or Cloudstack MAC/GRE or STT (you forgot this one), >>>>>>>> >>>>>>> based >>>>>>> >>>>>>>> solution. >>>>>>>> Proprietary approaches that drive enterprises to vendor lock ins. >>>>>>>> And >>>>>>>> instead >>>>>>>> of trying to address the first problem that is about >>>>>>>> >>>>>>> "interoperability", >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ______________________________**_________________ >>>>> nvo3 mailing list >>>>> [email protected] >>>>> https://www.ietf.org/mailman/**listinfo/nvo3<https://www.ietf.org/mailman/listinfo/nvo3> >>>>> >>>> ______________________________**_________________ >>> nvo3 mailing list >>> [email protected] >>> https://www.ietf.org/mailman/**listinfo/nvo3<https://www.ietf.org/mailman/listinfo/nvo3> >>> >> -- "Do not lie. And do not do what you hate."
_______________________________________________ nvo3 mailing list [email protected] https://www.ietf.org/mailman/listinfo/nvo3
