Lucy,
1. Constraining the distribution of Leaf A-D routes.
If you look at sections 9.2.3.2.1 and 9.2.3.4.1 of RFC 6514, you'll see
that there are some rules that enable you to avoid sending a Leaf A-D
route on an EBGP session unless a corresponding I/S-PMSI A-D route was
received over that session. There are similar rules in RFC 6514
governing the distribution of C-multicast routes. These rules are
intended to prevent the Leaf A-D routes and C-multicast routes from
being distributed more widely than necessary. Whether these rules
always work is questionable; they tend to have hidden assumptions about
the deployment.
But if you want to investigate ways to optimize the distribution of the
Leaf A-D routes, that's a good place to start.
One might try the following rule. If R1 receives a Leaf A-D route, and
if R1 is not identified in the route's RT, and if the Leaf A-D route has
a route key that is the NLRI of an S-PMSI A-D route that R1 has
installed, then only distribute the Leaf A-D route on a BGP session that
leads to the BGP speaker that is the next hop of the S-PMSI A-D route.
Whether this rule actually works in various deployment scenarios would
require further investigation.
[Lucy] To suppress unnecessary redistribution, a P-tunnel BGP node
tracks P-tunnel neighbor state. A BGP next hop is one of P-tunnel
downstream neighbor, upstream neighbor, and N/A. The policy is, if the
BGP next hop of the UPDATE of Leaf A-D route is the downstream neighbor,
redistribution the route; if not, no redistribution.
I don't understand this proposal; I don't see how you can tell by
examining the next hop of the Leaf A-D route whether you need to
redistribute the route. A rule based on the next hop of the
corresponding I/S-PMSI A-D route sounds more promising.
Another approach would be to use Constrained Route Distribution. This
would ensure that the Leaf A-D route reaches its target, and would
prevent the route from traveling over "unnecessary" alternate paths. In
certain deployment scenarios, ORF is also available as a way to prevent
routes from being distributed unnecessarily. Both these methods are
forms of RT-based filtering, and both are independent of MVPN.
Of course, one also has to worry about creating a robustness problem if
route distribution is constrained so that routes follow only one path.
Since the topic of this thread is "comment on draft-ietf-bess-ir", and
since that draft is in WGLC, I'll just point out again that this issue
is not specific to ingress replication.
[Lucy] IMO: this mechanism for membership announcement raises a BIG
concern on the scalability and performance. Why is it not a concern for you?
I wouldn't say it's not a concern, but it's important not to focus
exclusively on the worst case. Typical deployment scenarios don't come
close to the worst case, and there are various tools and filtering
policies that can be used to constrain the distribution of updates based
on the RTs.
2. Changing your parent on an IR tree
I think we have a disconnect here, having to do with the layering
between the MVPN application and BGP.
MVPN can create a route and give it to BGP. MVPN can set and modify
attributes of the route. MVPN can withdraw the route. But the
distribution of the route is controlled by BGP.
MVPN cannot tell BGP "send an update for NLRI X with attribute A1 on BGP
session S1, but send an update for NLRI X with attribute A2 on BGP
session S2". MVPN cannot tell BGP "send an update for NLRI X on session
S1 but send a withdraw for NLRI X on session S2." And MVPN cannot
control the timing of BGP's route distribution procedures.
In short, MVPN does not create and send the update messages.
[Lucy] To change the parent, a child sends out the UPDATE of Leaf A-D
route with new parent address in RT.
MVPN can tell BGP to change the RT and the PMSI Tunnel attribute on a
given Leaf A-D route. Suppose MVPN replaces the RT so that the RT now
identifies the new parent rather than the old one.
If Constrained Route Distribution is being used, this will cause an
explicit withdraw to be sent to the old parent. There is no way for the
MVPN process in the child node to control the timing of this BGP message.
If Constrained Route Distribution is not being used, changing the RT
will cause BGP to send a new update to the old parent as well as to the
new parent. The old parent will treat this as a replacement route, and
will consider the old route to have been (implicitly) withdrawn. This
behavior is mandated by section 3.1 of RFC4271. Since the old parent is
not identified in the RT, the action it must take is the action
specified for the withdrawal of a Leaf A-D route.
Now suppose MVPN doesn't simply replace the RT on the Leaf A-D route,
but adds a second RT, identifying the new parent. MVPN would also have
to replace the PMSI Tunnel attribute, to specify a new label for the new
parent to use. The old parent would see this route as a replacement
route. The route still identifies the old parent, but has a new label
in its PMSI Tunnel attribute. So the old parent will continue sending
traffic to the child, but will use the new label. Now both old and new
parents are using the same label, and the result will be data duplication.
I don't think there is any feasible way to switch parents in a "make
before break" fashion without requiring the old parent to have some
explicit knowledge that the switch is taking place. The procedure we
chose is to have the old parent time out the data plane entry for the
child. While this is not the only possible procedure, I don't think
there is anything both (a) simpler and (b) compatible with BGP's route
distribution procedures and with the layering between MVPN and BGP.
Of course, one could modify the MVPN/BGP layering by building more
MVPN-specific knowledge into BGP, one one could even decide that section
3.1 of RFC4271 shouldn't apply to Leaf A-D routes . Certainly there are
some cases where BGP knows that certain MCAST-VPN routes have to be
handled in a special way. But that's a lot more complicated than having
the parent node simply run a timer to time out the data plane states
when a route is withdrawn.
Eric
_______________________________________________
BESS mailing list
BESS@ietf.org
https://www.ietf.org/mailman/listinfo/bess