It seems rather counter-intuitive to want to try to repair things
end-to-end faster than one expects local devices to detect local
failures. The implied information race conditions seem an invitation to
trouble.
Yours,
Joel
On 3/19/2025 11:14 PM, Greg Mirsky wrote:
Hi Robert,
I wholeheartedly agree that local and e2e OAM are complementary tools
in an operator's toolbox. Usually, a multi-layer OAM is constructed so
that e2e provides the network with a safety net. In that manner, local
repair of a link failure is expected to restore services before the
failure is detected on the e2e level. As I understand it, the proposal
uses a different scheme. According to it, e2e network detection is
expected to be more aggressive than the link-level OAM. To me, that's
an unusual arrangement.
As for performance monitoring, although some performance metrics can
be measured spatially to compose e2e metrics, e2e performance
monitoring is easier to deploy in many environments.
Regards,
Greg
On Wed, Mar 19, 2025 at 11:21 PM Robert Raszuk <rob...@raszuk.net> wrote:
Hi Greg,
I am very much in support of end to end path assurance. And by
assurance I mean not only e2e liveness but also e2e loss, delays,
jitter etc ...
The main reason is that link layer failures (even if done on every
link in the path) does not provide any information about transit
via network devices. And those can be subject to packet drops,
selective packet drops (brownouts), delays and jitter via box
fabrics in distributed systems etc ... So to me even if e2e is
slower then local link detection it still very much a
preferred way to assure end to end path quality.
Sure some of them is done at the application layer, but then it is
done mainly for statistics and reporting. Doing it at network
layer opens up possibilities to choose different path (quite
likely via different provider) when original path experiences some
issues or service degradation which with link by link failure
detection is invisible to the endpoints.
I think at the end of the day those two are not really competing
solutions but complimentary. And of course end to end makes sense
especially in deployments when you can have diverse paths end to end.
Cheers
Robert
On Wed, Mar 19, 2025 at 4:58 AM Greg Mirsky
<gregimir...@gmail.com> wrote:
Hi Himanshu,
Thank you for the presentation of
draft-karboubi-spring-sidlist-optimized-cs-sr
<https://datatracker.ietf.org/doc/draft-karboubi-spring-sidlist-optimized-cs-sr/>.
If I understood your response to Ali correctly, the proposed
mechanism is expected to use more aggressive network failure
detection than the link layer. If that is correct, I
have several questions about the multi-layer OAM:
* AFAIK link-layer failures are detected within 10 ms using
a connectivity check mechanism (CCM of Y.1731 or a
single-hop BFD) with a 3.3 ms interval.
* If the link failure is detectable within 10 ms, what
detection time for the path, i.e., E2E connection failure
detection, is suggested? What interval between test probes
will be used in that case?
* Furthermore, even if the path converges around the link
failure before the local protection is deployed, the link
failure will be detected, and the protection mechanism
will be deployed despite the Orchestrator setting up its
recovery path in the network. If that is correct, local
defect detection and protection are unnecessary overheads.
Would you agree?
Regards,
Greg
_______________________________________________
BESS mailing list -- bess@ietf.org
To unsubscribe send an email to bess-le...@ietf.org
_______________________________________________
BESS mailing list --bess@ietf.org
To unsubscribe send an email tobess-le...@ietf.org
_______________________________________________
BESS mailing list -- bess@ietf.org
To unsubscribe send an email to bess-le...@ietf.org