On 01/12/2021 21:41, Robert Raszuk wrote:
Apologies 2 corrections:
1) s/to their inter-as/ to their inter-area/
2) "service stops for configured PULSE timeout (as discussed 200
sec)." Actually in the described case it is much worse ... Service
stops forever to such area as service layer may not be at all aware
about this kind of false positive !
I don't understand what "service stops" you are talking about. Pulse
will never stop any service. It will at most trigger the switch to
alternate service source. If there is none available, nothing will happen.
Btw this is also not an implementation detail as all multi vendor ABRs
better work in the same manner.
And the robust solution to this case seems to be along the lines of the
logic you have described. PULSES must be acted on by L2 ABRs or by
remote PEs *only* when all sources of the summaries inject identical PULSE.
not really, you can act on first one and ignore the same pulse from
other source that comes later. Unless the area partition case the first
pulse is guaranteed to mean the destination is unreachable. For area
partition case the pulse may trigger the switch to alternate source of
service, which is a good thing as has been described earlier.
Peter
That makes the feature a bit more complex ....
Thx,
R.
On Wed, Dec 1, 2021 at 9:25 PM Robert Raszuk <[email protected]
<mailto:[email protected]>> wrote:
Hi Tony,
I have been thinking about your email a bit more. Actually the
destructive issue you have described can happen not only in the case
of partitioned L1 areas.
*Deployment scenario: *
It is quite often the case that ABRs connectivity intra-area are
very different to their inter-as connections. That usually means
that different line cards are used to connect to other routers in
the local area then those in the core area.
So when anything happens to the line card which connects L1 (for
example it goes down, there is massive congestion, protocol queue is
full etc ...) when previously received LSPs expire such ABR may
trigger PULSE of all PE routers domain wide. And all the fuses
discussed to prevent massive flooding will not kick in as there may
be just say 10 PEs in the area - all working just fine.
The other ABRs will happily continue to inject summaries but service
stops for configured PULSE timeout (as discussed 200 sec). Note that
it is full service stop not switching to a backup path as all PEs in
the area PULSED domain wide. Not good.
I have not seen any discussion about such a failure case so far. And
only your mail triggered it !
Many thx,
R.
On Wed, Dec 1, 2021 at 5:04 PM Robert Raszuk <[email protected]
<mailto:[email protected]>> wrote:
Hi Tony,
On #2 I you are right in the case of src L1 getting partitioned.
Yes it will kill anycast design. If this is showstopper ... not
sure. AFAIK only sourcing ABRs need to keep track about all
links to PE to be down. That requirement does not propagate any
further upstream.
Thx
On Wed, Dec 1, 2021 at 4:58 PM Tony Przygienda
<[email protected] <mailto:[email protected]>> wrote:
1. my question is different. why does the draft say that
seqnr# & IDs have to be preserved between restarts
2. I'm still concerned about L1/L2 hierarchy. If an L2
border sees same prefix negative pulses from two different
L1/L2s it still has to keep state to only pulse into L1
after _all_ the guys pulsed negative (which is basically
impossible since the _negative_ cannot persist it seems).
Now how will it even know that? it has to keep track who
advertised the same summary & who pulsed or otherwise it
will pulse on anyone with a summary giving a pulse and with
that anycast won't work AFAIS and worse you get into weird
situations where you have 2 L1/L2 into same L1 area, one
lost link to reach the PE (arguably L1 got partitioned) and
pulses & then the L1/L2 on the border of the down L1 pulses
and tears the session down albeit the prefix is perfectly
reachable through the other L1/L2. I assume that parses for
the connoscenti ...
-=--- tony
On Wed, Dec 1, 2021 at 4:00 PM Peter Psenak
<[email protected] <mailto:[email protected]>> wrote:
Tony,
On 01/12/2021 15:31, Tony Przygienda wrote:
>
> Or maybe I missed something in the draft or between
the lines in the
> whole thing ... Do we assume the negative just
quickly tears down the
> BGP session & then it loses any relevance and we rely
on BGP to retry
> after reset automatically or something?
yes.
But then why do we even care about retaining the LSP IDs
& SeqNr# would
I ask?
it's used for the purpose of flooding, so that during
the flooding you
do not flood the same pulse LSP multiple times.
thanks,
Peter
>
> -- tony
>
>
>
>
>
> On Tue, Nov 30, 2021 at 11:19 PM Les Ginsberg (ginsberg)
> <[email protected]
<mailto:[email protected]>
> <mailto:[email protected]
<mailto:[email protected]>>> wrote:
>
> Hannes -
>
> Please see
>
https://datatracker.ietf.org/doc/html/draft-ppsenak-lsr-igp-event-notification-00#section-4.1
>
> The new Pulse LSPs don't have remaining lifetime
- quite intentionally.
> They are only retained long enough to support
flooding.
>
> But, you remind me that we need to specify how
the checksum is
> calculated. Will do that in the next revision.
>
> Thanx.
>
> Les
>
> > -----Original Message-----
> > From: Hannes Gredler <[email protected]
<mailto:[email protected]> <mailto:[email protected]
<mailto:[email protected]>>>
> > Sent: Tuesday, November 30, 2021 11:22 AM
> > To: Peter Psenak (ppsenak) <[email protected]
<mailto:[email protected]>
> <mailto:[email protected]
<mailto:[email protected]>>>
> > Cc: Robert Raszuk <[email protected]
<mailto:[email protected]> <mailto:[email protected]
<mailto:[email protected]>>>;
> Les Ginsberg (ginsberg)
> > <[email protected]
<mailto:[email protected]> <mailto:[email protected]
<mailto:[email protected]>>>; Aijun Wang
> <[email protected]
<mailto:[email protected]>
<mailto:[email protected]
<mailto:[email protected]>>>; lsr
> > <[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>>; Tony Li
<[email protected] <mailto:[email protected]>
> <mailto:[email protected]
<mailto:[email protected]>>>; Shraddha Hegde
> > <[email protected]
<mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>>
> > Subject: Re: [Lsr] BGP vs PUA/PULSE
> >
> > hi peter,
> >
> > Just curious: Do you have an idea how to make
short-lived LSPs
> compatible
> > with the problem stated in
> > https://datatracker.ietf.org/doc/html/rfc7987
> >
> > Would like to hear your thoughts on that.
> >
> > thanks,
> >
> > /hannes
> >
> > On Tue, Nov 30, 2021 at 01:15:04PM +0100,
Peter Psenak wrote:
> > | Hi Robert,
> > |
> > | On 30/11/2021 12:40, Robert Raszuk wrote:
> > | > Hey Peter,
> > | >
> > | > > #1 - I am not ok with the ephemeral
nature of the
> advertisements. (I
> > | > > proposed an alternative).
> > | >
> > | > LSPs have their age today. One can
generate LSP with the
> lifetime of 1
> > | > min. Protocol already allows that.
> > | >
> > | >
> > | > That's a pretty clever comparison indeed.
I had a feeling it
> will come
> > | > up here and here you go :)
> > | >
> > | > But I am afraid this is not comparing
apple to apples.
> > | >
> > | > In LSPs or LSA flooding you have a bunch
of mechanisms to
> make sure the
> > | > information stays fresh
> > | > and does not time out. And the default
refresh in ISIS if I
> recall was
> > | > something like 15 minutes ?
> > |
> > | yes, default refresh is 900 for the default
lifetime of 1200
> sec. Most
> > | people change both to much larger values.
> > |
> > | If I send the LSP with the lifetime of 1
min, there will never
> be any
> > | refresh of it. It will last 1 min and then
will be purged and
> removed from
> > | the database. The only difference with the
Pulse LSP is that it
> is not
> > | purged to avoid additional flooding.
> > |
> > |
> > | >
> > | > Today in all MPLS networks host routes
from all areas are
> "spread"
> > | > everywhere including all P and PE
routers, that's how LS
> protocols
> > | > distribute data, we have no other way
to do that in LS IGPs.
> > | >
> > | >
> > | > Can't you run OSPF over GRE ? For ISIS
Henk had proposal not
> so long ago
> > | > to run it over TCP too.
> > | >
>
https://datatracker.ietf.org/doc/html/draft-hsmit-lsr-isis-flooding-over-
> > tcp-00
> > |
> > | you can run anything over GRE, including
IGPs, and you don't
> need TCP
> > | transport for that. I don't see the
relevance here. Are you
> suggesting to
> > | create GRE tunnels to all PEs that need the
pulses? Nah, that
> would be an
> > | ugly requirement.
> > |
> > | thanks,
> > | Peter
> > |
> > |
> > | >
> > | > Seems like a perfect fit !
> > | >
> > | > Thx,
> > | > R.
> > |
>
> _______________________________________________
> Lsr mailing list
> [email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>
> https://www.ietf.org/mailman/listinfo/lsr
>
_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr