[bess] Re: [EXTERNAL] Re: Re: Inverse multi-layer OAM

Greg Mirsky Wed, 19 Mar 2025 21:01:54 -0700

Hi Himanshu,
I agree with Joel that inversing multi-layer OAM is a tricky and untested
proposal. Consider the usual multi-layer OAM arrangement. Link failure
detection is within 10 ms using 3.3 ms intervals. You stressed that e2e
uses more aggressive network failure detection. Would that be based on 1 ms
intervals for multi-hop BFD? AFAIK, in the usual multi-layer OAM, the e2e
network failure detection is based on 100 ms to ensure that the local
protection mechanism can converge without firing e2e recovery. However, in
the case of the inverse multi-layer OAM you presented, it appears that both
recovery mechanisms, i.e., local and e2e, will be deployed. In my opinion,
that is inefficient, confusing, and unnecessary. Am I missing something
here?


Regards,
Greg

On Thu, Mar 20, 2025 at 10:46 AM Shah, Himanshu <[email protected]> wrote:

> Disagree.
>
> We have discussed the motivation (for prioritizing e2e protection over
> local protection) in the draft.
>
> It serves the purpose without having to disable TI-LFA on each node – not
> a desirable option.
>
>
>
> Thanks,
>
> Himanshu
>
>
>
>
>
> *From: *Joel Halpern <[email protected]>
> *Date: *Thursday, March 20, 2025 at 10:41 AM
> *To: *Greg Mirsky <[email protected]>, Robert Raszuk <
> [email protected]>
> *Cc: *Shah, Himanshu <[email protected]>, BESS <[email protected]>,
> [email protected] <
> [email protected]>
> *Subject: *[**EXTERNAL**] Re: [bess] Re: Inverse multi-layer OAM
>
> It seems rather counter-intuitive to want to try to repair things
> end-to-end faster than one expects local devices to detect local failures.
> The implied information race conditions seem an invitation to trouble.
>
> Yours,
>
> Joel
>
> On 3/19/2025 11:14 PM, Greg Mirsky wrote:
>
> Hi Robert,
>
> I wholeheartedly agree that local and e2e OAM are complementary tools in
> an operator's toolbox. Usually, a multi-layer OAM is constructed so that
> e2e provides the network with a safety net. In that manner, local repair of
> a link failure is expected to restore services before the failure is
> detected on the e2e level. As I understand it, the proposal uses a
> different scheme. According to it, e2e network detection is expected to be
> more aggressive than the link-level OAM. To me, that's an unusual
> arrangement.
>
> As for performance monitoring, although some performance metrics can be
> measured spatially to compose e2e metrics, e2e performance monitoring is
> easier to deploy in many environments.
>
>
>
> Regards,
>
> Greg
>
>
>
> On Wed, Mar 19, 2025 at 11:21 PM Robert Raszuk <[email protected]> wrote:
>
> Hi Greg,
>
>
>
> I am very much in support of end to end path assurance. And by assurance I
> mean not only e2e liveness but also e2e loss, delays, jitter etc ...
>
>
>
> The main reason is that link layer failures (even if done on every link in
> the path) does not provide any information about transit via network
> devices. And those can be subject to packet drops, selective packet drops
> (brownouts), delays and jitter via box fabrics in distributed systems etc
> ... So to me even if e2e is slower then local link detection it still very
> much a preferred way to assure end to end path quality.
>
>
>
> Sure some of them is done at the application layer, but then it is done
> mainly for statistics and reporting. Doing it at network layer opens up
> possibilities to choose different path (quite likely via different
> provider) when original path experiences some issues or service degradation
> which with link by link failure detection is invisible to the endpoints.
>
>
>
> I think at the end of the day those two are not really competing solutions
> but complimentary. And of course end to end makes sense especially in
> deployments when you can have diverse paths end to end.
>
>
>
> Cheers
>
> Robert
>
>
>
> On Wed, Mar 19, 2025 at 4:58 AM Greg Mirsky <[email protected]> wrote:
>
> Hi Himanshu,
>
> Thank you for the presentation of 
> draft-karboubi-spring-sidlist-optimized-cs-sr
> [datatracker.ietf.org]
> <https://urldefense.com/v3/__https:/datatracker.ietf.org/doc/draft-karboubi-spring-sidlist-optimized-cs-sr/__;!!OSsGDw!J0VAlRE3z-g7qGfCezoeovWitrC4DFYS65Ly4YZq5r_I8SGk56sle4dQAFwya2R17BHyxx6ecg$>.
> If I understood your response to Ali correctly, the proposed mechanism is
> expected to use more aggressive network failure detection than the link
> layer. If that is correct, I have several questions about the multi-layer
> OAM:
>
>    - AFAIK link-layer failures are detected within 10 ms using a
>    connectivity check mechanism (CCM of Y.1731 or a single-hop BFD) with a 3.3
>    ms interval.
>    - If the link failure is detectable within 10 ms, what detection time
>    for the path, i.e., E2E connection failure detection, is suggested? What
>    interval between test probes will be used in that case?
>    - Furthermore, even if the path converges around the link failure
>    before the local protection is deployed, the link failure will be detected,
>    and the protection mechanism will be deployed despite the Orchestrator
>    setting up its recovery path in the network. If that is correct, local
>    defect detection and protection are unnecessary overheads. Would you agree?
>
>
>
> Regards,
>
> Greg
>
> _______________________________________________
> BESS mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>
>
>
> _______________________________________________
>
> BESS mailing list -- [email protected]
>
> To unsubscribe send an email to [email protected]
>
>

_______________________________________________
BESS mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[bess] Re: [**EXTERNAL**] Re: Re: Inverse multi-layer OAM

Reply via email to

[bess] Re: [EXTERNAL] Re: Re: Inverse multi-layer OAM