Hi, I double checked with all of my hardware teams, Reserved bit are usually ignored on reception so there’s no side effect to vxlan hardware, I believe TCP also was extended to support ECN this way. Also we are keeping this within network so server/vm won’t see this traffic.
Now when new encapsulation which are hardware friendly comes (Geneve/GUe/GPE we will respin the ASIC if required) and we will have hardware OAM solution even more powerful as we will be able to generate oam packet from ingress pipeline with OAM bit set so we verify actual datapath with QoS at initiating switch. Hardware based oam solution for hardware vtep is necessary as TTL or existing solutions can’t address all the problem. 1. As I said TTL expiry I like but it doesn’t verify the exact datapath as real vxlan traffic travels on the Leaf so it’s more of something better than nothing solution. TTL pipeline in ASIC based on few implementation I know from our ASIC occur(s) before the path for vxlan packet and it doesn’t verify the exact datapath, vni to vlan/bd mapping, etc. When packet is received on the core facing interface from the core side platform adds their own header, decrement the TTL and that cause exception, so complete packet is sent to software slow path. When packet is getting forwarded in hardware from core port <—-> core —vtep—bridge — Edge port. So steps 1. Add platform specific header 2. check for forwarding packet, if it’s for our (due to ip matching the vtep) 2. Decrement TTL 3. Check exception of TTL, if exception punt to software slow path 4. As it’s matching peer vtep ip, de-cap the packet and use VNI to add vlan header in the inner packet. 5. Now bridge the packet towards host on that vlan. Even new hardware has resilient ECMP where software and hardware ECMP are not the same, so TTL expiry method will be little hard even in underlay to get egress interface. 2. Also Leaf are deployed in resilent manner so traffic flow is not just vtep - vtep in same path. ------- L3 core ------- | L1—L2 —> Both leafs acting as Virtual Vtep IP. \ / \/ VM Now what happen is traffic to VM can be hashed using ECMP from L1 or L2 both, Now if link between VM and L1 goes down for some reason this information is local and not known to the core and packet will still be delivered to both L1 and L2, but traffic to VM is delivered over link connecting L1 -> L2 -> VM. 3. Even on intitiator leaf without having VM’s traffic profile VM path can’t be traced. Lets for example VM does ping between 2 endpoints it doesn’t cover the same path as real traffic as hardware at initiator does L3 hashing instead of L4 hashing for udp/tcp flows. Now VM hashing is required to get the sport and after that if packet is delivered from software we need to find the right egress interface which require outer header and in this scenario due to tunnel this hashing is different and implementation specific but usually different than l3 hashing in the core. In case of vxlan gpe, we won’t have to do all this software tricks as we can insert the packet right from the ingress pipeline and it will cover the right path as data, so we can move the bit position for gpe, geneve, gue as appropriately. Thanks, Deepak From: Haoweiguo <[email protected]<mailto:[email protected]>> Date: Wednesday, November 4, 2015 at 4:31 AM To: dekumar <[email protected]<mailto:[email protected]>>, Sam Aldrin <[email protected]<mailto:[email protected]>> Cc: Shahram Davari <[email protected]<mailto:[email protected]>>, "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>>, Dacheng Zhang <[email protected]<mailto:[email protected]>> Subject: RE: [nvo3] draft--pang--nvo3--vxlan--path--detection--01 Hi Sam, The extra bit in VXLAN reserved field has no side effect on regular VXLAN forwarding process. The hardware requirements for intermediate nodes is also low, the intermediate nodes only need to grab the data packets with the OAM flag to control plane using regular ACL, most current commertial chipsets can support this behavior. Thanks, weiguo ________________________________ From: Deepak Kumar (dekumar) [[email protected]<mailto:[email protected]>] Sent: Wednesday, November 04, 2015 14:51 To: Sam Aldrin Cc: Shahram Davari; [email protected]<mailto:[email protected]>; Dacheng Zhang Subject: Re: [nvo3] draft--pang--nvo3--vxlan--path--detection--01 HI Sam, Vxlan field that is used is reserved field and so existing Asic based hardware won't add this in transmit but receiving packet with reserved bit set has no side effect. If hardware is programmable their is no issue even in transmit. Can you give me example of any Asic implementation which will have problem, we can add text for user to be careful before turning on the solution. We can even call this extension of vxlan with pd bit. Thanks, Deepak Sent from my iPhone On Nov 4, 2015, at 3:09 PM, Sam Aldrin <[email protected]<mailto:[email protected]>> wrote: Hi Deepak, Aren’t you or aren’t you not changing the packet format by introducing PD flag bit in the reserved field. i.e changing RFC7348? If so, how can you claim to be informational? Is it because RFC is informational? For ex, VXLAN-GPE is in standards track, although it is now in expired state. Irrespective of technical differences, if a specific format is being changed, it will impact existing future deployments as well, informational or not. Being informational does not avoid that. -sam On Nov 3, 2015, at 8:03 PM, Deepak Kumar (dekumar) <[email protected]<mailto:[email protected]>> wrote: HI Sam, This is good discussion and we are bringing this draft as informatiinal draft for narrow scenario for some operators but not for other operators. Ttl solution is too slow at scale and instead of argument we can give data of how much time it takes but for some operator that amount of time is okay but for some they have will want it to complete it quickly. As this being informational solution it's brought to working group as hardware driven controller controlled scenario and make its language may and should so all the issues it may cause to software vtep can be fixed. Why can't software based and hardware based solution co-exist when information draft won't force everyone to implement it. Thanks Deepak Sent from my iPhone On Nov 4, 2015, at 12:41 PM, Sam Aldrin <[email protected]<mailto:[email protected]>> wrote: Hi Deepak, What you are describing is very narrow scenario, which has its own pitfalls. Inline for my comments. On Nov 3, 2015, at 7:10 PM, Deepak Kumar (dekumar) <[email protected]<mailto:[email protected]>> wrote: Hi Shahram/Sam, This solution is hardware centric with controller and policy needs to be created on each hop. This solution is not applicable for all scenarios. Policy example Match peer vtep ip == destination ip of packet destination port 4789, pd bit action punt and drop. Match peer vtep ip!=destination ip destination port == 4789, pd bit action punt and forward. If you want to employ policy for every vtep and on every device in the network, IMO, a bad design to start with. Now drop takes care of leak scenario from leafs. Now controller eats up the packet so no issue of loop. Also in network packet is going as data packet as per vxlan rule of max ttl so not sure where's loop. You mean there cannot be loops in n/w, just because TTL is used? (loop life is dependent on ttl) If loop is there oam and data both will suffer. Yes both will suffer. You use OAM to detect whether data plane has problem or not. With this, it will compound the problem. Loop with controller can be avoided but that's outside the scope. Alibaba is also operator and using this data center for cloud services. I agree Ttl expiry will also work but that's software solution and separate draft not this draft intention. If you already have a solution, why invent a new one? Are you saying controller is not efficient and cannot perform oam efficiently with existing ttl mechanism? :D On Concern of policy application controller will apply the policy and if network is not hardware oam capable they won't initiate it and use software oam method. Well, you have the answer right there. In other words, if a device cannot support your proposed solution, you will revert back to ttl solution. why don’t you just use that solution instead? We evaluated multiple Asic and found out solution can be done on multiple broadcom and custom Asic and Alibaba network is running on 2 different Broadcom Asic. And your point being? :D -sam Thanks Deepak Sent from my iPhone On Nov 4, 2015, at 11:29 AM, Sam Aldrin <[email protected]<mailto:[email protected]>> wrote: I expressed the same concern at last IETF meeting, as Shahram raised here. Haven’t gotten the explanation yet. If TTL expiry mechanism is used, then the definition of IP TTL will have to be redefined in order to make a copy and forward to next hop. But if L3 devices have to read into VXLAN header to determine OAM bit is set, they need to implement DPI for the same. Secondly, imagine when there exists a loop. In fact, they do exist even in controller based networks. Speaking as an operator, as mentioned yesterday, this will cause packet storm and unintended consequences. Why are we solving the problem when it doesn’t exist? -sam On Nov 3, 2015, at 6:02 PM, Shahram Davari <[email protected]<mailto:[email protected]>> wrote: I think your assumption is broken. But you have an alternative method and that is using TTL expiry. Thx SD From: Dacheng Zhang [mailto:[email protected]] Sent: Tuesday, November 03, 2015 5:53 PM To: Shahram Davari; [email protected]<mailto:[email protected]> Subject: Re: [nvo3] draft--pang--nvo3--vxlan--path--detection--01 This draft actually proposes a mechanism where the intermediates are required to recognize the vxlan oam packets. If this assumption is broken, the solutions proposed in this draft may not be effective. Cheers Dacheng 发件人: nvo3 <[email protected]<mailto:[email protected]>> on behalf of Shahram Davari <[email protected]<mailto:[email protected]>> 日期: 2015年11月4日 星期三 上午9:33 至: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> 主题: [nvo3] draft-‐pang-‐nvo3-‐vxlan-‐path-‐detection-‐01 Hi, This draft needs to address how intermediate L3 routers are going to see these VXLAN OAM packets, since L3 routers just do L3 routing and don’t look at the payload to see it is VXLAN and then see that these are PD OAM packets. The only option I can think of is TTL expiry, otherwise it won’t work, the way it is defined now, Thx Shahram _______________________________________________ nvo3 mailing list [email protected]<mailto:[email protected]>https://www.ietf.org/mailman/listinfo/nvo3 _______________________________________________ nvo3 mailing list [email protected]<mailto:[email protected]> https://www.ietf.org/mailman/listinfo/nvo3
_______________________________________________ nvo3 mailing list [email protected] https://www.ietf.org/mailman/listinfo/nvo3
