Re: [nvo3] Multi-subnet VNs [was Re: FW: New Version Notification for draft-yong-nvo3-frwk-dpreq-addition-00.txt]

NAPIERALA, MARIA H Thu, 20 Dec 2012 13:37:22 -0800

EVPN complexity lies in the interaction with bridging. For instance if one 
connects two EVPN access circuits with a physical wire (or bridges two VMs over 
a tunnel) you get a multihomed bridged site. Only one of the access ports can 
be active or otherwise loops will form.

But let's step back and look at the problem we are trying to solve. If majority 
(if not all) of traffic is IP and if majority of it is routed, wouldn't it be 
better to develop a networking solution that is optimized for this majority of 
traffic (and not the vice versa)?

The question is what problem does EVPN solve? In the context of DC, EVPN can 
only address packets bridged in the same VLAN. If most packets are routed then 
EVPN, even if all the complexity problems are addressed, doesn't achieve 
anything for the traffic that is routed. I believe it is the wrong tradeoff to 
design a solution around EVPN (i.e., around bridging).

Maria

From: [email protected] [mailto:[email protected]] On Behalf Of Aldrin 
Isaac
Sent: Wednesday, December 19, 2012 2:43 PM
To: Kireeti Kompella
Cc: Thomas Narten; [email protected]
Subject: [nvo3] Multi-subnet VNs [was Re: FW: New Version Notification for 
draft-yong-nvo3-frwk-dpreq-addition-00.txt]

Hi Kireeti,

In E-VPN, ARP is only flooded when the MAC-IP binding is unknown in BGP.  Once 
it is known, the local PE responds locally to the ARP request.   This scales 
quite well so it's not the best reason to lean one way or other.

An alternative for edge routing using EVPN is for an NVE to localize the VNs to 
which edge routing is desired and stand up a local IP forwarder across these VN 
using the IP info in the EVPN routes.  If the DMAC on a packet is not present 
in the EVI and if the payload is IP then pass to the IP forwarder....

In regards to optimizing multicast, with EVPN this can be done using VN 
dedicated to multicast distribution by using the VLAN-based MVR model.  It 
works well and used today.

Another problem that is addressed in EVPN is that segments can be multihomed 
using LAG.  With IP-only solutions, physical end station would need to 
multihome by advertising loopback IP over multiple physical IP interfaces.

We can have our TORs and use them too!! :)

Best regards -- aldrin

On Wednesday, December 19, 2012, Kireeti Kompella wrote:
Hi Aldrin,
On Tue, Dec 18, 2012 at 8:29 PM, Aldrin Isaac 
<[email protected]<mailto:[email protected]>> wrote:
Kireeti,

I'm not clear what difference it makes whether a packet is unicast
forwarded using MAC address or IP address within a subnet

Two important differences:
a) you don't have to know the MAC address if you forward on IP.  I.e., you 
don't have to propagate the ARP to the destination (flood), get the reply, bind 
IP to MAC (ARP table), and maintain ARP binding (timeout, validate, etc.).  The 
first is a real problem; the rest are annoyances that become problems at scale.

(Note that the ARMD WG was created to address this issue, and you know where 
that ended.)

(Note further that this may be hard to do in general, but in the case of an 
orchestrated data center, you have the information about where a given IP 
lives, and you have a control plane (ORACLE) to inform all relevant NVEs.  And 
of course, an overlay to shield the infrastructure from poking its nose into 
your forwarding behavior -- i.e., the infra doesn't care whether you route or 
switch TS traffic.)

b) In the quite common case where all traffic from a TS is IP, you don't have 
to maintain two tables and two forwarding paradigms at the NVE (one for IPs and 
one for MACs).  This is common enough to warrant optimization.

A third difference is that if you have only unicast traffic, you don't have to 
maintain a multicast tree (for flooding).  For some, this is a nice bonus, but 
I know you have a multicast packet or two in your network :-)

as long as
it gets to the intended destination along the most optimal path,
particularly when the price to pay is non-standard behavior
(intra-subnet ARP manglers ;}, etc).  I understand the argument about
the sub-optimal routing from a third site, but when the primary sites
end up aggregating prefixes for scaling reasons that argument falls
off the table.  One way or other the piper gets paid.

One way, the piper gets paid a fair bit more than the other!

In terms of the real world issue of getting there from here --
personally I haven't seen any vendor working towards a standards-based
solution that will allow intra-subnet routing for subnets over
HW/TOR-based PE, let alone intra-subnet routing for subnets that span
across both hypervisor-based PE and TOR-based PE.  This makes me leery
of solutions that can only take us half way there, particularly during
the transition phase.  So if we're talking about network
virtualization based purely on hypervisors, "route IP, bridge non-IP"
may be realistic if you're willing to accept the caveats, but does not
seem to be otherwise.

Good point.  Clearly, this is not a local decision: "route IP, bridge non-IP" 
means that intra-subnet routes are propagated the same way as inter-subnet 
routes, and thus every NVE, h/w or s/w, must be on the same page.

To make this concrete using BGP VPNs, "route IP, bridge non-IP" means all 
routes, intra- and inter-subnet, are propagated as IP VPN routes, and E-VPN 
routes contain MACs without IPs.  "Bridge intra-subnet IP and non-IP, route 
inter-subnet" means inter-subnet routes are propagated as IP VPN routes, and 
intra-VPN routes as E-VPN MAC+IP routes.

We can have a chat off-list on h/w vendors working towards this.  Hopefully, 
others will weigh the above arguments, and support this.  Deployers (like you) 
have a say in this too :-)

Btw, I understand how multicast may be less than efficient when
building both inter and intra subnet trees for the same IP mcast group
that end up overlapping links (maybe even more than twice) -- but I'd
like to hear your take on any other *insolvable* issues with regard to
multicast.

Isn't that enough?  :-)  I am not a multicast expert, but I can try to dig up 
IRB multicast horror stories.

Cheers,
Kireeti.

Best regards -- aldrin

On Tue, Dec 18, 2012 at 6:06 PM, Kireeti Kompella
<[email protected]<mailto:[email protected]>> wrote:
> Hi Thomas,
>
> On Dec 18, 2012, at 09:03 , Thomas Narten 
> <[email protected]<mailto:[email protected]>> wrote:
>
>> Kireeti Kompella 
>> <[email protected]<mailto:[email protected]>> writes:
>>
>>> The solution is simple: route if IP, bridge if not.  Yes, one could
>>> do IRB, but why?  IRB brings in complications, especially for
>>> multicast.  I'm sure someone suggested this already, so put me down
>>> as supporting this view.
>>
>> I'm not sure I understand the difference.
>>
>> From an *NVE* perspective, when it receives a packet (which will have
>> an L2 header), it can look at the Ethertype, and if its IP, it can
>> route it. Otherwise, it can provide normal L2 service. So, in this
>> sense, "route if IP, bridge if not" is straightforward. And more to
>> the point, I assume that if the packet gets L2 service, the entire VN
>> is treated as a *single* broadcast domain. All nodes can reach all
>> other nodes. Right?
>
> Right.
>
>> Just so I understand, how is this different than IRB?  What does IRB
>> imply that the above does not?
>
> IRB follows the principle of "bridge when you can, route otherwise".  So, an 
> IP packet with dest IP in the same subnet actually gets bridged; the 
> originator (e.g., the VM) is responsible for ARPing the IP address, slapping 
> the right dest MAC on the packet and sending that to the NVE which simply 
> forwards based on dest MAC address *without* decrementing the TTL.
>
> If the dest IP is in another subnet, the packet is sent to the gateway (which 
> for IRB would be the same NVE), which this time does an IP address lookup, 
> decrements TTL and routes the packet.
>
> For multicast, there are even more differences.
>
>> But this is different than what (I believe) Lucy is arguing for. In
>> the case of a multi-subnet VN, you have one VN, but it contains
>> different subnets. Each subnet is intended to be one broadcast domain
>> (i.e., equivalent of a VLAN), so that when sending LL multicast and
>> the like on a specific subnet, such packets are *not* delivered to all
>> nodes in the VN, but only those that are part of subnet.
>
> If one were to configure multiple subnets on a VLAN, I wonder if LL traffic 
> goes to all members of the VLAN, or just those in the same subnet as the 
> sender.  I suspect the former (but don't know).
>
>> This is a more complex type of service to provide. And I'm not sure we
>> need this type of service to be provided by one VN.
>
> Agree.
>
>> A (seemingly
>> simpler) alternative would be to put each subnet in its own VN and
>> allow inter-subnet traffic to be handed as inter-VN traffic. So long
>> as that case is optimized (i.e., the ingress NVE can tunnel directly
>> to the egress NVE without adding triangular routing), this would seem
>> to be a cleaner way to implement this.
>
> Can be done.  However, we're on Lucy's topic; mine was "route if IP, bridge 
> otherwise"; the goal was to rationalize the need for Layer 2 forwarding for 
> non-IP traffic, and inter- and intra-subnet routing.
>
> Kireeti.
>
>> Thomas
>>
>> _______________________________________________
>> nvo3 mailing list
>> [email protected]<mailto:[email protected]>
>> https://www.ietf.org/mailman/listinfo/nvo3
>
> _______________________________________________
> nvo3 mailing list
> [email protected]<mailto:[email protected]>
> https://www.ietf.org/mailman/listinfo/nvo3

--
Kireeti

_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] Multi-subnet VNs [was Re: FW: New Version Notification for draft-yong-nvo3-frwk-dpreq-addition-00.txt]

Reply via email to