At the very least, I would expect to see some explanation of why one would also be running TI-LFA.  And probably a discussion of how this interacts with the information propagation when the local detection kicks in.   I can believe both points can be addressed, but it is hard to understand without them.

Yours,

Joel

On 3/19/2025 11:46 PM, Shah, Himanshu wrote:

Disagree.

We have discussed the motivation (for prioritizing e2e protection over local protection) in the draft.

It serves the purpose without having to disable TI-LFA on each node – not a desirable option.

Thanks,

Himanshu

*From: *Joel Halpern <j...@joelhalpern.com>
*Date: *Thursday, March 20, 2025 at 10:41 AM
*To: *Greg Mirsky <gregimir...@gmail.com>, Robert Raszuk <rob...@raszuk.net> *Cc: *Shah, Himanshu <hs...@ciena.com>, BESS <bess@ietf.org>, draft-karboubi-spring-sidlist-optimized-cs...@ietf.org <draft-karboubi-spring-sidlist-optimized-cs...@ietf.org>
*Subject: *[**EXTERNAL**] Re: [bess] Re: Inverse multi-layer OAM

It seems rather counter-intuitive to want to try to repair things end-to-end faster than one expects local devices to detect local failures.  The implied information race conditions seem an invitation to trouble.

Yours,

Joel

On 3/19/2025 11:14 PM, Greg Mirsky wrote:

    Hi Robert,

    I wholeheartedly agree that local and e2e OAM are complementary
    tools in an operator's toolbox. Usually, a multi-layer OAM is
    constructed so that e2e provides the network with a safety net. In
    that manner, local repair of a link failure is expected to restore
    services before the failure is detected on the e2e level. As I
    understand it, the proposal uses a different scheme. According to
    it, e2e network detection is expected to be more aggressive than
    the link-level OAM. To me, that's an unusual arrangement.

    As for performance monitoring, although some performance metrics
    can be measured spatially to compose e2e metrics, e2e performance
    monitoring is easier to deploy in many environments.

    Regards,

    Greg

    On Wed, Mar 19, 2025 at 11:21 PM Robert Raszuk <rob...@raszuk.net>
    wrote:

        Hi Greg,

        I am very much in support of end to end path assurance. And by
        assurance I mean not only e2e liveness but also e2e loss,
        delays, jitter etc ...

        The main reason is that link layer failures (even if done on
        every link in the path) does not provide any information about
        transit via network devices. And those can be subject to
        packet drops, selective packet drops (brownouts), delays and
        jitter via box fabrics in distributed systems etc ... So to me
        even if e2e is slower then local link detection it still very
        much a preferred way to assure end to end path quality.

        Sure some of them is done at the application layer, but then
        it is done mainly for statistics and reporting. Doing it at
        network layer opens up possibilities to choose different path
        (quite likely via different provider) when original path
        experiences some issues or service degradation which with link
        by link failure detection is invisible to the endpoints.

        I think at the end of the day those two are not really
        competing solutions but complimentary. And of course end to
        end makes sense especially in deployments when you can have
        diverse paths end to end.

        Cheers

        Robert

        On Wed, Mar 19, 2025 at 4:58 AM Greg Mirsky
        <gregimir...@gmail.com> wrote:

            Hi Himanshu,

            Thank you for the presentation of
            draft-karboubi-spring-sidlist-optimized-cs-sr
            [datatracker.ietf.org]
            
<https://urldefense.com/v3/__https:/datatracker.ietf.org/doc/draft-karboubi-spring-sidlist-optimized-cs-sr/__;!!OSsGDw!J0VAlRE3z-g7qGfCezoeovWitrC4DFYS65Ly4YZq5r_I8SGk56sle4dQAFwya2R17BHyxx6ecg$>.
            If I understood your response to Ali correctly, the
            proposed mechanism is expected to use more
            aggressive network failure detection than the link layer.
            If that is correct, I have several questions about the
            multi-layer OAM:

              * AFAIK link-layer failures are detected within 10 ms
                using a connectivity check mechanism (CCM of Y.1731 or
                a single-hop BFD) with a 3.3 ms interval.
              * If the link failure is detectable within 10 ms, what
                detection time for the path, i.e., E2E connection
                failure detection, is suggested? What interval between
                test probes will be used in that case?
              * Furthermore, even if the path converges around the
                link failure before the local protection is deployed,
                the link failure will be detected, and the protection
                mechanism will be deployed despite the Orchestrator
                setting up its recovery path in the network. If
                that is correct, local defect detection and protection
                are unnecessary overheads. Would you agree?

            Regards,

            Greg

            _______________________________________________
            BESS mailing list -- bess@ietf.org
            To unsubscribe send an email to bess-le...@ietf.org



    _______________________________________________

    BESS mailing list --bess@ietf.org

    To unsubscribe send an email tobess-le...@ietf.org
_______________________________________________
BESS mailing list -- bess@ietf.org
To unsubscribe send an email to bess-le...@ietf.org

Reply via email to