EVPN complexity lies in the interaction with bridging. For instance if one connects two EVPN access circuits with a physical wire (or bridges two VMs over a tunnel) you get a multihomed bridged site. Only one of the access ports can be active or otherwise loops will form.
But let's step back and look at the problem we are trying to solve. If majority (if not all) of traffic is IP and if majority of it is routed, wouldn't it be better to develop a networking solution that is optimized for this majority of traffic (and not the vice versa)? The question is what problem does EVPN solve? In the context of DC, EVPN can only address packets bridged in the same VLAN. If most packets are routed then EVPN, even if all the complexity problems are addressed, doesn't achieve anything for the traffic that is routed. I believe it is the wrong tradeoff to design a solution around EVPN (i.e., around bridging). Maria From: [email protected] [mailto:[email protected]] On Behalf Of Aldrin Isaac Sent: Wednesday, December 19, 2012 2:43 PM To: Kireeti Kompella Cc: Thomas Narten; [email protected] Subject: [nvo3] Multi-subnet VNs [was Re: FW: New Version Notification for draft-yong-nvo3-frwk-dpreq-addition-00.txt] Hi Kireeti, In E-VPN, ARP is only flooded when the MAC-IP binding is unknown in BGP. Once it is known, the local PE responds locally to the ARP request. This scales quite well so it's not the best reason to lean one way or other. An alternative for edge routing using EVPN is for an NVE to localize the VNs to which edge routing is desired and stand up a local IP forwarder across these VN using the IP info in the EVPN routes. If the DMAC on a packet is not present in the EVI and if the payload is IP then pass to the IP forwarder.... In regards to optimizing multicast, with EVPN this can be done using VN dedicated to multicast distribution by using the VLAN-based MVR model. It works well and used today. Another problem that is addressed in EVPN is that segments can be multihomed using LAG. With IP-only solutions, physical end station would need to multihome by advertising loopback IP over multiple physical IP interfaces. We can have our TORs and use them too!! :) Best regards -- aldrin On Wednesday, December 19, 2012, Kireeti Kompella wrote: Hi Aldrin, On Tue, Dec 18, 2012 at 8:29 PM, Aldrin Isaac <[email protected]<mailto:[email protected]>> wrote: Kireeti, I'm not clear what difference it makes whether a packet is unicast forwarded using MAC address or IP address within a subnet Two important differences: a) you don't have to know the MAC address if you forward on IP. I.e., you don't have to propagate the ARP to the destination (flood), get the reply, bind IP to MAC (ARP table), and maintain ARP binding (timeout, validate, etc.). The first is a real problem; the rest are annoyances that become problems at scale. (Note that the ARMD WG was created to address this issue, and you know where that ended.) (Note further that this may be hard to do in general, but in the case of an orchestrated data center, you have the information about where a given IP lives, and you have a control plane (ORACLE) to inform all relevant NVEs. And of course, an overlay to shield the infrastructure from poking its nose into your forwarding behavior -- i.e., the infra doesn't care whether you route or switch TS traffic.) b) In the quite common case where all traffic from a TS is IP, you don't have to maintain two tables and two forwarding paradigms at the NVE (one for IPs and one for MACs). This is common enough to warrant optimization. A third difference is that if you have only unicast traffic, you don't have to maintain a multicast tree (for flooding). For some, this is a nice bonus, but I know you have a multicast packet or two in your network :-) as long as it gets to the intended destination along the most optimal path, particularly when the price to pay is non-standard behavior (intra-subnet ARP manglers ;}, etc). I understand the argument about the sub-optimal routing from a third site, but when the primary sites end up aggregating prefixes for scaling reasons that argument falls off the table. One way or other the piper gets paid. One way, the piper gets paid a fair bit more than the other! In terms of the real world issue of getting there from here -- personally I haven't seen any vendor working towards a standards-based solution that will allow intra-subnet routing for subnets over HW/TOR-based PE, let alone intra-subnet routing for subnets that span across both hypervisor-based PE and TOR-based PE. This makes me leery of solutions that can only take us half way there, particularly during the transition phase. So if we're talking about network virtualization based purely on hypervisors, "route IP, bridge non-IP" may be realistic if you're willing to accept the caveats, but does not seem to be otherwise. Good point. Clearly, this is not a local decision: "route IP, bridge non-IP" means that intra-subnet routes are propagated the same way as inter-subnet routes, and thus every NVE, h/w or s/w, must be on the same page. To make this concrete using BGP VPNs, "route IP, bridge non-IP" means all routes, intra- and inter-subnet, are propagated as IP VPN routes, and E-VPN routes contain MACs without IPs. "Bridge intra-subnet IP and non-IP, route inter-subnet" means inter-subnet routes are propagated as IP VPN routes, and intra-VPN routes as E-VPN MAC+IP routes. We can have a chat off-list on h/w vendors working towards this. Hopefully, others will weigh the above arguments, and support this. Deployers (like you) have a say in this too :-) Btw, I understand how multicast may be less than efficient when building both inter and intra subnet trees for the same IP mcast group that end up overlapping links (maybe even more than twice) -- but I'd like to hear your take on any other *insolvable* issues with regard to multicast. Isn't that enough? :-) I am not a multicast expert, but I can try to dig up IRB multicast horror stories. Cheers, Kireeti. Best regards -- aldrin On Tue, Dec 18, 2012 at 6:06 PM, Kireeti Kompella <[email protected]<mailto:[email protected]>> wrote: > Hi Thomas, > > On Dec 18, 2012, at 09:03 , Thomas Narten > <[email protected]<mailto:[email protected]>> wrote: > >> Kireeti Kompella >> <[email protected]<mailto:[email protected]>> writes: >> >>> The solution is simple: route if IP, bridge if not. Yes, one could >>> do IRB, but why? IRB brings in complications, especially for >>> multicast. I'm sure someone suggested this already, so put me down >>> as supporting this view. >> >> I'm not sure I understand the difference. >> >> From an *NVE* perspective, when it receives a packet (which will have >> an L2 header), it can look at the Ethertype, and if its IP, it can >> route it. Otherwise, it can provide normal L2 service. So, in this >> sense, "route if IP, bridge if not" is straightforward. And more to >> the point, I assume that if the packet gets L2 service, the entire VN >> is treated as a *single* broadcast domain. All nodes can reach all >> other nodes. Right? > > Right. > >> Just so I understand, how is this different than IRB? What does IRB >> imply that the above does not? > > IRB follows the principle of "bridge when you can, route otherwise". So, an > IP packet with dest IP in the same subnet actually gets bridged; the > originator (e.g., the VM) is responsible for ARPing the IP address, slapping > the right dest MAC on the packet and sending that to the NVE which simply > forwards based on dest MAC address *without* decrementing the TTL. > > If the dest IP is in another subnet, the packet is sent to the gateway (which > for IRB would be the same NVE), which this time does an IP address lookup, > decrements TTL and routes the packet. > > For multicast, there are even more differences. > >> But this is different than what (I believe) Lucy is arguing for. In >> the case of a multi-subnet VN, you have one VN, but it contains >> different subnets. Each subnet is intended to be one broadcast domain >> (i.e., equivalent of a VLAN), so that when sending LL multicast and >> the like on a specific subnet, such packets are *not* delivered to all >> nodes in the VN, but only those that are part of subnet. > > If one were to configure multiple subnets on a VLAN, I wonder if LL traffic > goes to all members of the VLAN, or just those in the same subnet as the > sender. I suspect the former (but don't know). > >> This is a more complex type of service to provide. And I'm not sure we >> need this type of service to be provided by one VN. > > Agree. > >> A (seemingly >> simpler) alternative would be to put each subnet in its own VN and >> allow inter-subnet traffic to be handed as inter-VN traffic. So long >> as that case is optimized (i.e., the ingress NVE can tunnel directly >> to the egress NVE without adding triangular routing), this would seem >> to be a cleaner way to implement this. > > Can be done. However, we're on Lucy's topic; mine was "route if IP, bridge > otherwise"; the goal was to rationalize the need for Layer 2 forwarding for > non-IP traffic, and inter- and intra-subnet routing. > > Kireeti. > >> Thomas >> >> _______________________________________________ >> nvo3 mailing list >> [email protected]<mailto:[email protected]> >> https://www.ietf.org/mailman/listinfo/nvo3 > > _______________________________________________ > nvo3 mailing list > [email protected]<mailto:[email protected]> > https://www.ietf.org/mailman/listinfo/nvo3 -- Kireeti
_______________________________________________ nvo3 mailing list [email protected] https://www.ietf.org/mailman/listinfo/nvo3
