Tony,

On 02/12/2021 11:49, Tony Przygienda wrote:
Idly thinking about the stuff more and more issues pop up that confirm my initial gut feeling that the pulse stuff is simply not what IGP can do reasonably (i.e. liveliness). negative as liveliness indication is arguably even worse ;-) but I think most of us agreed on that across those hundreds of emails by now.

So, to expound a bit. IGP reachability which IGP does normally is _very_ different from liveliness and here's another example (I describe it in principle but people who deployed stuff will know what scenarios I'm talking about)

So, in short, the fact that an IGP, let's say ABR, advertises a summary has _nothing_ to do much with liveliness of what it summarizes in system wide sense. In more specifics, even when this aggregate goes away or IGP cannot compute _reachability_ to a specific address/node does NOT mean that the prefix advertised by such node is not _alive_.

Imagine (often done in fact in deployments I dealt with) that the prefix advertised by a node into IGP is not _reachable_ by IGP all of a sudden, simplest case being a link loss of course. However, it is in the system still reachable by means e.g. of a default route from another protocol or a specific route (static?) over a link IGP is not running on. Now, if IGP starts to pulse it will defeat the very purpose of such backup.

no less specific route will ever make something that went down reachable. The purpose of the pulse is not to defeat the purpose of the default, or less specific route. The purpose of the pulse is to notify interested clients that the reachability of some less specific route (typically a host route) that is covered by the summary in its source area is lost.

If a unique host route that was reachable in its source area became unreachable because its originator became unreachable, we know for sure that the host route is gone no matter what less specific routes may cover it.



And no, you cannot "know" whether backup is here, there are even funky cases where a policy only installs a backup route if the primary went away which may be fast enough to keep e.g. TCP up (whether it's the best possible architecture is disputable but it's a fact of live that such stuff exists).

So, basically we try to invent "liveliness indication" in IGP whereas IGP cannot be aware whether the prefix is reachable system-wide through it even when IGP lost _reachability_.

we can limit the pulse notification to host prefixes. That should address your concern.

thanks,
Peter



And yes, before we go there, I know that with enough "limited domain" and "limited scale" and "limited use case" arguments anything one can imagine "works" ...

--- tony

On Wed, Dec 1, 2021 at 8:13 PM Les Ginsberg (ginsberg) <[email protected] <mailto:[email protected]>> wrote:

    Tony –____

    __ __

    Inline.____

    __ __

    *From:* Tony Przygienda <[email protected]
    <mailto:[email protected]>>
    *Sent:* Wednesday, December 1, 2021 9:33 AM
    *To:* Les Ginsberg (ginsberg) <[email protected]
    <mailto:[email protected]>>
    *Cc:* Peter Psenak (ppsenak) <[email protected]
    <mailto:[email protected]>>; Hannes Gredler <[email protected]
    <mailto:[email protected]>>; lsr <[email protected]
    <mailto:[email protected]>>; Tony Li <[email protected]
    <mailto:[email protected]>>; Aijun Wang <[email protected]
    <mailto:[email protected]>>; Robert Raszuk
    <[email protected] <mailto:[email protected]>>; Shraddha Hegde
    <[email protected] <mailto:[email protected]>>
    *Subject:* Re: [Lsr] BGP vs PUA/PULSE____

    __ __

    "____

    Nodes which originate FSP-LSPs MUST____

        remember the last sequence number used for a given FSP-LSP and____

        increment the sequence number when generating a new version.____

    __  __

        FSP-LSP generation SHOULD utilize the "next" FSP-LSP ID each time 
new____

        pulse information needs to be advertised i.e., if the most recent____

        FSP-LSP ID used was A-00.n, the next set of pulse information SHOULD____

        be advertised usingFSP-LSP.ID  <http://FSP-LSP.ID>  A-00.n+1.  This 
minimizes the____

        possibility of confusion if other routers in the network have not 
yet____

        removed A-00.n from their LSPDB.
    "____

    So you tell me I onver-interpreted as "between restarts" ;-) OK, fine. Fair 
'nuff. Maybe add one sentence clarification.____

    */[LES:] Sure./*____

    Otherwise yeah, I'd like the draft to add the "in case of partition things may break but 
it's not much worse than before" ;-) and "assumption is that the overlay will retry after 
dropping session on negative so no positives are needed" and I'm ok with this thread.____

    */[LES:] I think significantly more needs to be said about the
    current use case for event notification – and this point can be part
    of that. Look for that in the next revision of the draft./*____

    my big gripe about "don't do it in main ISIS, take service instance" 
remains though due to scalability concerns that bunch of senior folks here raised 
already____

    */[LES:] I am not in favor of a separate instance in this case.
    Reason being all of the information required to determine when to
    send pulses is already known by the main instance. Moving the pulse
    advertisements themselves to a separate instance would likely be
    more costly in resources on the routers themselves than advertising
    them in the main instance. Scale considerations need to be addressed
    – as has been stated in this and earlier threads many times – and
    that would be true regardless of whether we used the main instance
    or a separate instance. ____/*

    */There is also the point made by Greg Mirsky early on in this
    discussion – that the use of event-notification needs to be
    carefully limited to cases that make sense for the main routing
    instance. The next revision of the draft will also address this
    point.____/*

    */    Les/*____

    -- tony____

    __ __

    On Wed, Dec 1, 2021 at 5:52 PM Les Ginsberg (ginsberg)
    <[email protected] <mailto:[email protected]>> wrote:____

        Tony –____

        ____

        ____

        *From:* Tony Przygienda <[email protected]
        <mailto:[email protected]>>
        *Sent:* Wednesday, December 1, 2021 7:58 AM
        *To:* Peter Psenak (ppsenak) <[email protected]
        <mailto:[email protected]>>
        *Cc:* Les Ginsberg (ginsberg) <[email protected]
        <mailto:[email protected]>>; Hannes Gredler <[email protected]
        <mailto:[email protected]>>; lsr <[email protected]
        <mailto:[email protected]>>; Tony Li <[email protected]
        <mailto:[email protected]>>; Aijun Wang <[email protected]
        <mailto:[email protected]>>; Robert Raszuk
        <[email protected] <mailto:[email protected]>>; Shraddha Hegde
        <[email protected] <mailto:[email protected]>>
        *Subject:* Re: [Lsr] BGP vs PUA/PULSE____

        ____

        1. my question is different. why does the draft say that seqnr#
        & IDs have to be preserved between restarts ____

        ____

        ____

        */[LES:] Section 4.3.1 of the draft tries to answer your
        question – but there is no mention of “restart” there./*____

        */There is in fact no mention of restart anywhere in the draft
        other than to say pulses are not preserved across restarts./*____

        *//*____

        */WE only retain the sequence #’s to make it easier to identify
        a new Pulse LSP from a retransmission./*____

        *//*____

        *//*____

        2. I'm still concerned about L1/L2 hierarchy. If an L2 border
        sees same prefix negative pulses from two different L1/L2s  it
        still has to keep state to only pulse into L1 after _all_ the
        guys pulsed negative (which is basically impossible since the
        _negative_ cannot persist it seems). Now how will it even know
        that? it has to keep track who advertised the same summary & who
        pulsed or otherwise it will pulse on anyone with a summary
        giving a pulse and with that anycast won't work AFAIS and worse
        you get into weird situations where you have 2 L1/L2 into same
        L1 area, one lost link to reach the PE (arguably L1 got
        partitioned) and pulses & then the L1/L2 on the border of the
        down L1 pulses and tears the session down albeit the prefix is
        perfectly reachable through the other L1/L2. I assume that
        parses for the connoscenti ... ____

        ____

        */[LES:] We are not trying to handle the area partition case./*____

        */In such a case, even if nothing is done, traffic will flow via
        both ABRs and half of it will be dropped – so one could argue
        that switching BGP traffic to the backup path is still a good
        idea./*____

        *//*____

        */   Les/*____

        ____

        -=--- tony ____

        ____

        On Wed, Dec 1, 2021 at 4:00 PM Peter Psenak <[email protected]
        <mailto:[email protected]>> wrote:____

            Tony,

            On 01/12/2021 15:31, Tony Przygienda wrote:

             >
             > Or maybe I missed something in the draft or between the
            lines in the
             > whole thing ... Do we assume the negative just quickly
            tears down the
             > BGP session & then it loses any relevance and we rely on
            BGP to retry
             > after reset automatically or something?

            yes.


            But then why do we even care about retaining the LSP IDs &
            SeqNr# would
            I ask?

            it's used for the purpose of flooding, so that during the
            flooding you
            do not flood the same pulse LSP multiple times.

            thanks,
            Peter


             >
             > -- tony
             >
             >
             >
             >
             >
             > On Tue, Nov 30, 2021 at 11:19 PM Les Ginsberg (ginsberg)
             > <[email protected]
            <mailto:[email protected]>
             > <mailto:[email protected]
            <mailto:[email protected]>>> wrote:
             >
             >     Hannes -
             >
             >     Please see
             >
            
https://datatracker.ietf.org/doc/html/draft-ppsenak-lsr-igp-event-notification-00#section-4.1
             >
             >     The new Pulse LSPs don't have remaining lifetime -
            quite intentionally.
             >     They are only retained long enough to support flooding.
             >
             >     But, you remind me that we need to specify how the
            checksum is
             >     calculated. Will do that in the next revision.
             >
             >     Thanx.
             >
             >          Les
             >
             >      > -----Original Message-----
             >      > From: Hannes Gredler <[email protected]
            <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>>
             >      > Sent: Tuesday, November 30, 2021 11:22 AM
             >      > To: Peter Psenak (ppsenak) <[email protected]
            <mailto:[email protected]>
             >     <mailto:[email protected] <mailto:[email protected]>>>
             >      > Cc: Robert Raszuk <[email protected]
            <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>>;
             >     Les Ginsberg (ginsberg)
             >      > <[email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>>;
            Aijun Wang
             >     <[email protected]
            <mailto:[email protected]>
            <mailto:[email protected]
            <mailto:[email protected]>>>; lsr
             >      > <[email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>>; Tony Li
            <[email protected] <mailto:[email protected]>
             >     <mailto:[email protected] <mailto:[email protected]>>>;
            Shraddha Hegde
             >      > <[email protected]
            <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>>
             >      > Subject: Re: [Lsr] BGP vs PUA/PULSE
             >      >
             >      > hi peter,
             >      >
             >      > Just curious: Do you have an idea how to make
            short-lived LSPs
             >     compatible
             >      > with the problem stated in
             >      > https://datatracker.ietf.org/doc/html/rfc7987
             >      >
             >      > Would like to hear your thoughts on that.
             >      >
             >      > thanks,
             >      >
             >      > /hannes
             >      >
             >      > On Tue, Nov 30, 2021 at 01:15:04PM +0100, Peter
            Psenak wrote:
             >      > | Hi Robert,
             >      > |
             >      > | On 30/11/2021 12:40, Robert Raszuk wrote:
             >      > | > Hey Peter,
             >      > | >
             >      > | >      > #1 - I am not ok with the ephemeral
            nature of the
             >     advertisements. (I
             >      > | >      > proposed an alternative).
             >      > | >
             >      > | >     LSPs have their age today. One can
            generate LSP with the
             >     lifetime of 1
             >      > | >     min. Protocol already allows that.
             >      > | >
             >      > | >
             >      > | > That's a pretty clever comparison indeed. I
            had a feeling it
             >     will come
             >      > | > up here and here you go :)
             >      > | >
             >      > | > But I am afraid this is not comparing apple to
            apples.
             >      > | >
             >      > | > In LSPs or LSA flooding you have a bunch of
            mechanisms to
             >     make sure the
             >      > | > information stays fresh
             >      > | > and does not time out. And the default refresh
            in ISIS if I
             >     recall was
             >      > | > something like 15 minutes ?
             >      > |
             >      > | yes, default refresh is 900 for the default
            lifetime of 1200
             >     sec. Most
             >      > | people change both to much larger values.
             >      > |
             >      > | If I send the LSP with the lifetime of 1 min,
            there will never
             >     be any
             >      > | refresh of it. It will last 1 min and then will
            be purged and
             >     removed from
             >      > | the database. The only difference with the Pulse
            LSP is that it
             >     is not
             >      > | purged to avoid additional flooding.
             >      > |
             >      > |
             >      > | >
             >      > | >     Today in all MPLS networks host routes
            from all areas are
             >     "spread"
             >      > | >     everywhere including all P and PE routers,
            that's how LS
             >     protocols
             >      > | >     distribute data, we have no other way to
            do that in LS IGPs.
             >      > | >
             >      > | >
             >      > | > Can't you run OSPF over GRE ? For ISIS Henk
            had proposal not
             >     so long ago
             >      > | > to run it over TCP too.
             >      > | >
             >
            
https://datatracker.ietf.org/doc/html/draft-hsmit-lsr-isis-flooding-over-
             >      > tcp-00
             >      > |
             >      > | you can run anything over GRE, including IGPs,
            and you don't
             >     need TCP
             >      > | transport for that. I don't see the relevance
            here. Are you
             >     suggesting to
             >      > | create GRE tunnels to all PEs that need the
            pulses? Nah, that
             >     would be an
             >      > | ugly requirement.
             >      > |
             >      > | thanks,
             >      > | Peter
             >      > |
             >      > |
             >      > | >
             >      > | > Seems like a perfect fit !
             >      > | >
             >      > | > Thx,
             >      > | > R.
             >      > |
             >
             >     _______________________________________________
             >     Lsr mailing list
             > [email protected] <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>
             > https://www.ietf.org/mailman/listinfo/lsr
             > ____


_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr

Reply via email to