Re: [nvo3] Let's refocus on real world

Jon Hudson Tue, 28 Aug 2012 10:49:03 -0700

No you are right. VMware has done a lot of work through things like VAAI
and Storage DRS etc to mitigate and in some cases eliminate many of the
issues. However, not everyone runs VMware. (while I must admit, being at
VMworld this week, I wouldn't tell any of these folks that ;-)


On Tue, Aug 28, 2012 at 10:38 AM, Ivan Pepelnjak <[email protected]>wrote:

> Storage I/O control (SIOC) is supposed to work on datastore level across
> multiple hypervisors (previous disclaimer still applies :) )
>
>
> On 8/28/12 7:32 PM, Stiliadis, Dimitrios (Dimitri) wrote:
>
>> ;) see my "multiplexed" comment below (i.e. two servers mounting the same
>> file system).
>>
>> IO shares work on the server side ..
>>
>> Cheers,
>>
>> Dimitri
>>
>> On 8/28/12 10:25 AM, "Ivan 
>> Pepelnjak"<ipepelnjak@gmail.**com<[email protected]>>
>>  wrote:
>>
>>  If I understood vSphere manuals and discussions on various blogs/forums
>>> correctly, VMware solved most of this problem a long time ago with I/O
>>> shares and a few other features ... but don't trust a networking guy to
>>> know anything about storage :)
>>>
>>> On 8/28/12 7:10 PM, Jon Hudson wrote:
>>>
>>>> Dead on.
>>>>
>>>> Anytime you have a fan-in-fan-out type traffic flow with filesystem
>>>> info one person can ruin the party for everyone. FCoE is a perfect
>>>> example where a pause frame sent on a aggregation link can end up
>>>> impacting many initiators. Or even at a controller level of any array
>>>> you can get a traffic jam of sorts on poorly designed and layed out
>>>> subsystems. Or too few lines for food at an IETF social.
>>>>
>>>> Lots can be done with queues etc to mitigate the issue, but it is
>>>> always something to be mindful of. Especially if your remote filesystem
>>>> is not just a mounted LUN but the mainsystem/boot LUN and you have
>>>> windows paging over the wire.
>>>>
>>>> On Aug 28, 2012, at 9:55 AM, "Stiliadis, Dimitrios
>>>> (Dimitri)"<dimitri.stiliadis@**alcatel-lucent.com<[email protected]>>
>>>>   wrote:
>>>>
>>>>  FCoE is clearly not a requirementŠ
>>>>>
>>>>> but, there is something to be said about storage (and I should have
>>>>> responded in the
>>>>> other email about this), but in general storage isolation is done at
>>>>> the
>>>>> storage
>>>>> level and not the network layer. So, we can ignore.
>>>>>
>>>>> If we take a a storage server that exports a file system that is
>>>>> mounted
>>>>> by a
>>>>> hypervisor and multiple tenants have their VMs in this file system,
>>>>> then a
>>>>> single
>>>>> network connection between hypervisor and storage device could
>>>>> potentially
>>>>> lead to head of line blocking and allow one tenant to influence the
>>>>> performance
>>>>> of another tenant. If my memory serves me correct, VMware for example
>>>>> can
>>>>> only
>>>>> use two or four iSCSI initiators that have be to shared by the
>>>>> different
>>>>> VMs
>>>>> of the hypervisor, and thus traffic from multiple tenants is
>>>>> multiplexed
>>>>> on the same
>>>>> network flow .. This means that storage drivers/devices have to take
>>>>> care
>>>>> of
>>>>> traffic isolation. And this can be perfectly fine in point-to-point
>>>>> situations, but
>>>>> it can get interesting in multiplexed scenarios Š
>>>>>
>>>>> (but we just don't want the storage guys to blame the network guys for
>>>>> performance issues ;)
>>>>>
>>>>> Dimitri
>>>>>
>>>>> On 8/28/12 9:44 AM, "Ivan 
>>>>> Pepelnjak"<ipepelnjak@gmail.**com<[email protected]>>
>>>>>   wrote:
>>>>>
>>>>>
>>>>>> In sane real-life designs the virtual network overlay solution would
>>>>>> not
>>>>>> transport FCoE. I'm also positive someone will come up with exactly
>>>>>> that
>>>>>> requirement sooner rather than later :D
>>>>>>
>>>>>> On 8/28/12 6:40 PM, Aldrin Isaac wrote:
>>>>>>
>>>>>> The question regarding FCoE is whether overlay solutions need to
>>>>>> transport it.  I think the answer is no.  If something operates at the
>>>>>> underlay level than it isn't in scope for NVo3, including DCB.
>>>>>>
>>>>>> On Tuesday, August 28, 2012, Somesh Gupta wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>  -----Original Message-----
>>>>>>> From:
>>>>>>>
>>>>>> [email protected]<**javascript:;>   [mailto:[email protected]
>>>>>> <javascript:;>] On Behalf Of
>>>>>>
>>>>>>> Ivan Pepelnjak
>>>>>>> Sent: Tuesday, August 28, 2012 12:22 AM
>>>>>>> To: Stiliadis, Dimitrios (Dimitri)
>>>>>>> Cc: Black, David;
>>>>>>>
>>>>>> [email protected]<javascript:;>; Linda Dunbar
>>>>>>
>>>>>>> Subject: Re: [nvo3] Let's refocus on real world (was: Comments on
>>>>>>> Live
>>>>>>> Migration and VLAN-IDs)
>>>>>>>
>>>>>>> Dimitri,
>>>>>>>
>>>>>>> We're more in agreement than it might seem. I might have my doubts
>>>>>>> about
>>>>>>> the operational viability of the OpenStack-to-baremetal use case you
>>>>>>> described below, but I'm positive someone will try to do that as
>>>>>>> well.
>>>>>>>
>>>>>>> In any case, regardless of whether we're considering VMs or
>>>>>>> bare-metal
>>>>>>> servers, in the simplest scenario the server-to-NVE connection is a
>>>>>>> point-to-point link, usually without VLAN tagging.
>>>>>>>
>>>>>>> In the VM/hypervisor case, NVE is implemented in the hypervisor soft
>>>>>>> switch; in the baremetal server case, it has to be implemented in the
>>>>>>> ToR switch.
>>>>>>>
>>>>>> This is certainly only today's restriction. If nov3 takes off, there
>>>>>> certainly could be a pseudo-driver in Linux that could implement the
>>>>>> NVE (like a VLAN driver) without much additional overhead.
>>>>>>
>>>>>>  It's important to keep in mind the limitations of the ToR switches to
>>>>>>> ensure whatever solution we agree upon will be implementable in ToR
>>>>>>> switches as well, but it makes absolutely no sense to assume NVE will
>>>>>>> not be in the hypervisor (because someone wants to support a customer
>>>>>>> having a decade-old VLAN-only hypervisor soft switch).
>>>>>>>
>>>>>>> As for ToR switch capabilities, Dell has demonstrated NVGRE support
>>>>>>> and
>>>>>>> Arista is right now showing off a hardware VXLAN VTEP prototype, so I
>>>>>>> guess it's safe to assume next-generation merchant silicon will
>>>>>>> support
>>>>>>> GRE- and UDP-based encapsulations well before we'll agree on what
>>>>>>> NVO3
>>>>>>> solution should be.
>>>>>>>
>>>>>>> Finally, can at least some of us agree that the topology that makes
>>>>>>> most
>>>>>>> sense is a direct P2P link between (VM or bare-metal) server and NVE
>>>>>>> using VLAN tagging only when a server participating in multiple L2
>>>>>>> CUGs
>>>>>>> has interface limitations?
>>>>>>>
>>>>>>> Kind regards,
>>>>>>> Ivan
>>>>>>>
>>>>>>> On 8/27/12 6:55 AM, Stiliadis, Dimitrios (Dimitri) wrote:
>>>>>>>
>>>>>>>> Ivan:
>>>>>>>>
>>>>>>>> I agree and at the same time disagree with some of the statements
>>>>>>>> below. I would like to understand your view.
>>>>>>>>
>>>>>>>> See inline:
>>>>>>>>
>>>>>>>> On 8/25/12 8:22 AM, "Ivan 
>>>>>>>> Pepelnjak"<ipepelnjak@gmail.**com<[email protected]>>
>>>>>>>>    wrote:
>>>>>>>>
>>>>>>>>  On 8/24/12 11:11 PM, Linda Dunbar wrote:
>>>>>>>>> [...]
>>>>>>>>>
>>>>>>>>>  But most, if not all, data centers today don't have the
>>>>>>>>>> Hypervisors
>>>>>>>>>> which can encapsulate the NVo3 defined header. The deployment to
>>>>>>>>>>
>>>>>>>>> all
>>>>>>>
>>>>>>>> 100% NVo3 header based servers won't happen overnight. One thing
>>>>>>>>>>
>>>>>>>>> for
>>>>>>>
>>>>>>>> sure that you will see data centers with mixed types of servers
>>>>>>>>>> for
>>>>>>>>>> very long time.
>>>>>>>>>>
>>>>>>>>>> If NVEs are in the ToR, you will see mixed scenario of blade
>>>>>>>>>>
>>>>>>>>> servers,
>>>>>>>
>>>>>>>> servers with simple virtual switches, or even IEEE802.1Qbg's VEPA.
>>>>>>>>>>
>>>>>>>>> So
>>>>>>>
>>>>>>>> it is necessary for NVo3 to deal with the "L2 Site" defined in
>>>>>>>>>> this
>>>>>>>>>> draft.
>>>>>>>>>>
>>>>>>>>> There are two hypothetical ways of implementing NVO3: existing
>>>>>>>>>
>>>>>>>> layer-2
>>>>>>>
>>>>>>>> technologies (with well-known scaling properties that prompted the
>>>>>>>>> creation of NVO3 working group) or something-over-IP encapsulation.
>>>>>>>>>
>>>>>>>>> I might be myopic, but from what I see most data centers today (at
>>>>>>>>>
>>>>>>>> least
>>>>>>>
>>>>>>>> based on market shares of individual vendors) don't have ToR
>>>>>>>>>
>>>>>>>> switches
>>>>>>>
>>>>>>>> that would be able to encapsulate MAC frames or IP datagrams in
>>>>>>>>> UDP,
>>>>>>>>>
>>>>>>>> GRE
>>>>>>>
>>>>>>>> or MPLS envelopes. I am not familiar enough with the commonly used
>>>>>>>>> merchant silicon hardware to understand whether that's a software
>>>>>>>>> or
>>>>>>>>> hardware limitation. In any case, I wouldn't expect switch vendors
>>>>>>>>>
>>>>>>>> to
>>>>>>>
>>>>>>>> roll out NVO3-like something-over-IP solutions any time soon.
>>>>>>>>>
>>>>>>>>> On the hypervisor front, VXLAN is shipping for months, NVGRE is
>>>>>>>>>
>>>>>>>> included
>>>>>>>
>>>>>>>> in the next version of Hyper-V and MAC-over-GRE is available (with
>>>>>>>>>
>>>>>>>> Open
>>>>>>>
>>>>>>>> vSwitch) for both KVM and Xen. Open vSwitch is also part of
>>>>>>>>> standard
>>>>>>>>> Linux kernel distribution and thus available to any other Linux-
>>>>>>>>>
>>>>>>>> based
>>>>>>>
>>>>>>>> hypervisor product.
>>>>>>>>>
>>>>>>>>> So: all major hypervisors have MAC-over-IP solutions, each one
>>>>>>>>> using
>>>>>>>>>
>>>>>>>> a
>>>>>>>
>>>>>>>> proprietary encapsulation because there's no standard way of doing
>>>>>>>>>
>>>>>>>> it,
>>>>>>>
>>>>>>>> and yet we're spending time discussing and documenting the history
>>>>>>>>>
>>>>>>>> of
>>>>>>>
>>>>>>>> evolution of virtual networking. Maybe we should be a bit more
>>>>>>>>> forward-looking, acknowledge the world has changed, and come up
>>>>>>>>> with
>>>>>>>>>
>>>>>>>> a
>>>>>>>
>>>>>>>> relevant hypervisor-based solution.
>>>>>>>>>
>>>>>>>> Correct, and here is where IETF as a standard body fails. There is
>>>>>>>> no
>>>>>>>> easy way (any time soon) for a VXLAN based solution to talk to an
>>>>>>>>
>>>>>>> NVGRE
>>>>>>>
>>>>>>>> or MAC/GRE, or Cloudstack MAC/GRE or STT  (you forgot this one),
>>>>>>>>
>>>>>>> based
>>>>>>>
>>>>>>>> solution.
>>>>>>>> Proprietary approaches that drive enterprises to vendor lock ins.
>>>>>>>> And
>>>>>>>> instead
>>>>>>>> of trying to address the first problem that is about
>>>>>>>>
>>>>>>> "interoperability",
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>  ______________________________**_________________
>>>>> nvo3 mailing list
>>>>> [email protected]
>>>>> https://www.ietf.org/mailman/**listinfo/nvo3<https://www.ietf.org/mailman/listinfo/nvo3>
>>>>>
>>>> ______________________________**_________________
>>> nvo3 mailing list
>>> [email protected]
>>> https://www.ietf.org/mailman/**listinfo/nvo3<https://www.ietf.org/mailman/listinfo/nvo3>
>>>
>>


-- 
"Do not lie. And do not do what you hate."

_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] Let's refocus on real world

Reply via email to