Tony,
On 02/12/2021 15:58, Tony Przygienda wrote:
On Thu, Dec 2, 2021 at 2:03 PM Peter Psenak <[email protected]
<mailto:[email protected]>> wrote:
Tony,
On 02/12/2021 11:49, Tony Przygienda wrote:
> Idly thinking about the stuff more and more issues pop up that
confirm
> my initial gut feeling that the pulse stuff is simply not what
IGP can
> do reasonably (i.e. liveliness). negative as liveliness
indication is
> arguably even worse ;-) but I think most of us agreed on that across
> those hundreds of emails by now.
>
> So, to expound a bit. IGP reachability which IGP does normally is
_very_
> different from liveliness and here's another example (I describe
it in
> principle but people who deployed stuff will know what scenarios I'm
> talking about)
>
> So, in short, the fact that an IGP, let's say ABR, advertises a
summary
> has _nothing_ to do much with liveliness of what it summarizes in
system
> wide sense. In more specifics, even when this aggregate goes away
or IGP
> cannot compute _reachability_ to a specific address/node does NOT
mean
> that the prefix advertised by such node is not _alive_.
>
> Imagine (often done in fact in deployments I dealt with) that the
prefix
> advertised by a node into IGP is not _reachable_ by IGP all of a
sudden,
> simplest case being a link loss of course. However, it is in the
system
> still reachable by means e.g. of a default route from another
protocol
> or a specific route (static?) over a link IGP is not running on.
Now, if
> IGP starts to pulse it will defeat the very purpose of such backup.
no less specific route will ever make something that went down
reachable.
we disagree based on my experience whereas the "went down" is only
"IGP does not see it anymore" in the draft definition here unless we want
to start to write in LSR draft that encompass multi-protocol router
requirements and
system design.
for any practical purposes when the IGP can not see a host route in its
source area it is considered down. That's what happens today without
summarization.
The purpose of the pulse is not to defeat the purpose of the
default, or less specific route. The purpose of the pulse is to notify
interested clients that the reachability of some less specific route
(typically a host route) that is covered by the summary in its source
area is lost.
If a unique host route that was reachable in its source area became
unreachable because its originator became unreachable, we know for sure
that the host route is gone no matter what less specific routes may
cover it.
so if we intend to inform the service source that "IGP thinks it cannot
reach an address anymore through
an ABR (because other ABRs may still reach it [solving that preconditions
AFAIS re-invention of add-path in IGP on leaking and/or lots interesting
paxos-direction protocol work] but maybe other protocols/aggregates can
still reach it)"
so the source as you say "may do something" then it slowly deems to me
that we are not standardizing anything but some "rough hunch of a rumour"
I would never advise a customer to act upon given how very expensive
a false positive on service is including e.g. BGP re-sync or tunnel
tear-downs.
we are going to use the same "rumour" that is used today when the /32 is
lost on ABR in the source area. The only difference is that instead of
stopping the advertisement of the previously advertised /32, we generate
a pulse and keep advertising the summary.
thanks,
Peter
>
> And no, you cannot "know" whether backup is here, there are even
funky
> cases where a policy only installs a backup route if the primary
went
> away which may be fast enough to keep e.g. TCP up (whether it's
the best
> possible architecture is disputable but it's a fact of live that
such
> stuff exists).
>
> So, basically we try to invent "liveliness indication" in IGP
whereas
> IGP cannot be aware whether the prefix is reachable system-wide
through
> it even when IGP lost _reachability_.
we can limit the pulse notification to host prefixes. That should
address your concern.
I would prefer not to add hidden /32 semantics to protocol features
since the more things become "special" and "asymmetric" the harder it's
to explain how things work & why they don't work when deployed as expected.
-- tony
_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr