On 01/12/2021 21:41, Robert Raszuk wrote:
Apologies 2 corrections:

1)  s/to their inter-as/ to their inter-area/

2)  "service stops for configured PULSE timeout (as discussed 200 sec)."  Actually in the described case it is much worse ... Service stops forever to such area as service layer may not be at all aware about this kind of false positive !

I don't understand what "service stops" you are talking about. Pulse will never stop any service. It will at most trigger the switch to alternate service source. If there is none available, nothing will happen.


Btw this is also not an implementation detail as all multi vendor ABRs better work in the same manner.

And the robust solution to this case seems to be along the lines of the logic you have described. PULSES must be acted on by L2 ABRs or by remote PEs *only* when all sources of the summaries inject identical PULSE.

not really, you can act on first one and ignore the same pulse from other source that comes later. Unless the area partition case the first pulse is guaranteed to mean the destination is unreachable. For area partition case the pulse may trigger the switch to alternate source of service, which is a good thing as has been described earlier.

Peter



That makes the feature a bit more complex ....

Thx,
R.







On Wed, Dec 1, 2021 at 9:25 PM Robert Raszuk <[email protected] <mailto:[email protected]>> wrote:

    Hi Tony,

    I have been thinking about your email a bit more. Actually the
    destructive issue you have described can happen not only in the case
    of partitioned L1 areas.

    *Deployment scenario: *

    It is quite often the case that ABRs connectivity intra-area are
    very different to their inter-as connections. That usually means
    that different line cards are used to connect to other routers in
    the local area then those in the core area.

    So when anything happens to the line card which connects L1 (for
    example it goes down, there is massive congestion, protocol queue is
    full etc ...) when previously received LSPs expire such ABR may
    trigger PULSE of all PE routers domain wide. And all the fuses
    discussed to prevent massive flooding will not kick in as there may
    be just say 10 PEs in the area - all working just fine.

    The other ABRs will happily continue to inject summaries but service
    stops for configured PULSE timeout (as discussed 200 sec). Note that
    it is full service stop not switching to a backup path as all PEs in
    the area PULSED domain wide. Not good.

    I have not seen any discussion about such a failure case so far. And
    only your mail triggered it !

    Many thx,
    R.



    On Wed, Dec 1, 2021 at 5:04 PM Robert Raszuk <[email protected]
    <mailto:[email protected]>> wrote:

        Hi Tony,

        On #2 I you are right in the case of src L1 getting partitioned.
        Yes it will kill anycast design. If this is showstopper ... not
        sure. AFAIK only sourcing ABRs need to keep track about all
        links to PE to be down. That requirement does not propagate any
        further upstream.

        Thx

        On Wed, Dec 1, 2021 at 4:58 PM Tony Przygienda
        <[email protected] <mailto:[email protected]>> wrote:

            1. my question is different. why does the draft say that
            seqnr# & IDs have to be preserved between restarts

            2. I'm still concerned about L1/L2 hierarchy. If an L2
            border sees same prefix negative pulses from two different
            L1/L2s  it still has to keep state to only pulse into L1
            after _all_ the guys pulsed negative (which is basically
            impossible since the _negative_ cannot persist it seems).
            Now how will it even know that? it has to keep track who
            advertised the same summary & who pulsed or otherwise it
            will pulse on anyone with a summary giving a pulse and with
            that anycast won't work AFAIS and worse you get into weird
            situations where you have 2 L1/L2 into same L1 area, one
            lost link to reach the PE (arguably L1 got partitioned) and
            pulses & then the L1/L2 on the border of the down L1 pulses
            and tears the session down albeit the prefix is perfectly
            reachable through the other L1/L2. I assume that parses for
            the connoscenti ...

            -=--- tony

            On Wed, Dec 1, 2021 at 4:00 PM Peter Psenak
            <[email protected] <mailto:[email protected]>> wrote:

                Tony,

                On 01/12/2021 15:31, Tony Przygienda wrote:

                 >
                 > Or maybe I missed something in the draft or between
                the lines in the
                 > whole thing ... Do we assume the negative just
                quickly tears down the
                 > BGP session & then it loses any relevance and we rely
                on BGP to retry
                 > after reset automatically or something?

                yes.


                But then why do we even care about retaining the LSP IDs
                & SeqNr# would
                I ask?

                it's used for the purpose of flooding, so that during
                the flooding you
                do not flood the same pulse LSP multiple times.

                thanks,
                Peter


                 >
                 > -- tony
                 >
                 >
                 >
                 >
                 >
                 > On Tue, Nov 30, 2021 at 11:19 PM Les Ginsberg (ginsberg)
                 > <[email protected]
                <mailto:[email protected]>
                 > <mailto:[email protected]
                <mailto:[email protected]>>> wrote:
                 >
                 >     Hannes -
                 >
                 >     Please see
                 >
                
https://datatracker.ietf.org/doc/html/draft-ppsenak-lsr-igp-event-notification-00#section-4.1
                 >
                 >     The new Pulse LSPs don't have remaining lifetime
                - quite intentionally.
                 >     They are only retained long enough to support
                flooding.
                 >
                 >     But, you remind me that we need to specify how
                the checksum is
                 >     calculated. Will do that in the next revision.
                 >
                 >     Thanx.
                 >
                 >          Les
                 >
                 >      > -----Original Message-----
                 >      > From: Hannes Gredler <[email protected]
                <mailto:[email protected]> <mailto:[email protected]
                <mailto:[email protected]>>>
                 >      > Sent: Tuesday, November 30, 2021 11:22 AM
                 >      > To: Peter Psenak (ppsenak) <[email protected]
                <mailto:[email protected]>
                 >     <mailto:[email protected]
                <mailto:[email protected]>>>
                 >      > Cc: Robert Raszuk <[email protected]
                <mailto:[email protected]> <mailto:[email protected]
                <mailto:[email protected]>>>;
                 >     Les Ginsberg (ginsberg)
                 >      > <[email protected]
                <mailto:[email protected]> <mailto:[email protected]
                <mailto:[email protected]>>>; Aijun Wang
                 >     <[email protected]
                <mailto:[email protected]>
                <mailto:[email protected]
                <mailto:[email protected]>>>; lsr
                 >      > <[email protected] <mailto:[email protected]>
                <mailto:[email protected] <mailto:[email protected]>>>; Tony Li
                <[email protected] <mailto:[email protected]>
                 >     <mailto:[email protected]
                <mailto:[email protected]>>>; Shraddha Hegde
                 >      > <[email protected]
                <mailto:[email protected]>
                <mailto:[email protected] <mailto:[email protected]>>>
                 >      > Subject: Re: [Lsr] BGP vs PUA/PULSE
                 >      >
                 >      > hi peter,
                 >      >
                 >      > Just curious: Do you have an idea how to make
                short-lived LSPs
                 >     compatible
                 >      > with the problem stated in
                 >      > https://datatracker.ietf.org/doc/html/rfc7987
                 >      >
                 >      > Would like to hear your thoughts on that.
                 >      >
                 >      > thanks,
                 >      >
                 >      > /hannes
                 >      >
                 >      > On Tue, Nov 30, 2021 at 01:15:04PM +0100,
                Peter Psenak wrote:
                 >      > | Hi Robert,
                 >      > |
                 >      > | On 30/11/2021 12:40, Robert Raszuk wrote:
                 >      > | > Hey Peter,
                 >      > | >
                 >      > | >      > #1 - I am not ok with the ephemeral
                nature of the
                 >     advertisements. (I
                 >      > | >      > proposed an alternative).
                 >      > | >
                 >      > | >     LSPs have their age today. One can
                generate LSP with the
                 >     lifetime of 1
                 >      > | >     min. Protocol already allows that.
                 >      > | >
                 >      > | >
                 >      > | > That's a pretty clever comparison indeed.
                I had a feeling it
                 >     will come
                 >      > | > up here and here you go :)
                 >      > | >
                 >      > | > But I am afraid this is not comparing
                apple to apples.
                 >      > | >
                 >      > | > In LSPs or LSA flooding you have a bunch
                of mechanisms to
                 >     make sure the
                 >      > | > information stays fresh
                 >      > | > and does not time out. And the default
                refresh in ISIS if I
                 >     recall was
                 >      > | > something like 15 minutes ?
                 >      > |
                 >      > | yes, default refresh is 900 for the default
                lifetime of 1200
                 >     sec. Most
                 >      > | people change both to much larger values.
                 >      > |
                 >      > | If I send the LSP with the lifetime of 1
                min, there will never
                 >     be any
                 >      > | refresh of it. It will last 1 min and then
                will be purged and
                 >     removed from
                 >      > | the database. The only difference with the
                Pulse LSP is that it
                 >     is not
                 >      > | purged to avoid additional flooding.
                 >      > |
                 >      > |
                 >      > | >
                 >      > | >     Today in all MPLS networks host routes
                from all areas are
                 >     "spread"
                 >      > | >     everywhere including all P and PE
                routers, that's how LS
                 >     protocols
                 >      > | >     distribute data, we have no other way
                to do that in LS IGPs.
                 >      > | >
                 >      > | >
                 >      > | > Can't you run OSPF over GRE ? For ISIS
                Henk had proposal not
                 >     so long ago
                 >      > | > to run it over TCP too.
                 >      > | >
                 >
                
https://datatracker.ietf.org/doc/html/draft-hsmit-lsr-isis-flooding-over-
                 >      > tcp-00
                 >      > |
                 >      > | you can run anything over GRE, including
                IGPs, and you don't
                 >     need TCP
                 >      > | transport for that. I don't see the
                relevance here. Are you
                 >     suggesting to
                 >      > | create GRE tunnels to all PEs that need the
                pulses? Nah, that
                 >     would be an
                 >      > | ugly requirement.
                 >      > |
                 >      > | thanks,
                 >      > | Peter
                 >      > |
                 >      > |
                 >      > | >
                 >      > | > Seems like a perfect fit !
                 >      > | >
                 >      > | > Thx,
                 >      > | > R.
                 >      > |
                 >
                 >     _______________________________________________
                 >     Lsr mailing list
                 > [email protected] <mailto:[email protected]>
                <mailto:[email protected] <mailto:[email protected]>>
                 > https://www.ietf.org/mailman/listinfo/lsr
                 >


_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr

Reply via email to