Hi Gyan

Thank you for your detailed analysis.

I agree with you that there are already approaches to troubleshoot for the use 
cases, which are mentioned in the draft.
However, there are some disadvantage in the existing mechanism, from a 
maintenance personnel's point of view.


1.    Different venders may have their own mechanism for troubleshooting. A 
maintenance person could spend a lot of time to get used to these mechanisms. 
Also, for a method that involves inter-device cooperation, such as detecting 
the root of route oscillations, devices from different venders are very likely 
not be able to cooperate.

2.    For routing protocols that have built-in mechanisms for troubleshooting, 
such as RSVP-TE setup failure, there are some scalability issues. If further 
trouble-shooting mechanisms were introduced, an extension would be necessary. 
If multiple routing protocols are facing a common problem, each of them needs 
to be extended. There could be a lot of duplicated work.

3.    For issues like No-Advertise, you can certainly check the configuration 
logs to locate the problem. But this may requires maintenance personnel to 
check all devices one-by-one. With an east-west methods, we could just get 
those information on one device. This is actually one of the original intention 
for PASP.

So, we're not trying to re-invent existing troubleshooting methods. PASP can 
work with existing mechanisms to achieve better results.

Best.
Zhen

From: Gyan Mishra [mailto:hayabusa...@gmail.com]
Sent: Sunday, April 2, 2023 1:06 PM
To: Les Ginsberg (ginsberg) <ginsberg=40cisco....@dmarc.ietf.org>
Cc: Robert Raszuk <rob...@raszuk.net>; rtgwg@ietf.org; tanzhen (A) 
<tanzh...@huawei.com>
Subject: Re: Comments on draft-li-rtgwg-protocol-assisted-protocol


Hi Zhen

I read the draft and I think the problem statement is not completely accurate  
as far the gap with existing methods of troubleshooting. Some feedback below.

Logging into router, existing telemetry tools, BMP, Netconf/yang, home grown 
tools, vendor tools provide a fairly comprehensive approach to troubleshooting 
that have been around for years.

1. BGP route oscillation.
Vendor tools as well as CLI commands as well as route analytics tools on the 
market can easily get to root of route oscillations.

2. RSVP-TE setup failure
Most vendor CLI show traffic engineering tunnel xyz shows exact reason for 
failure.  As well misconfiguration, link failures on all paths can result in 
tunnel down or failover to FRR backup path

3. Peer disconnect
The NOC would have access to all devices even if half way around the world. If 
between different admin domains more than likely PASP would have to be agreed 
upon which may not be possible.
1st step NMS telemetry would immediately detect the disconnect, checking logs 
traps and further diagnose by logging into router

4. Route interruptions
More rare and hard to troubleshoot.  This maybe timing based time of day and 
maybe even hard for PASP to detect.

5. No advertise
This is usually due to a network change, recent misconfiguration, down node 
advertising prefix, can check logs for last configuration change to policy and 
restore rollback quickly to last known good  config state

6. Route abnormal
Route leak or route hijack can be detected by BMP and routing analytics

Also there are in many cases the problem is not as simple to troubleshoot and 
especially intermittent problems and I think it would be very difficult or even 
possible to develop a protocol to troubleshoot those types of complex issues.

This new transport protocol seems a lot of work to support and seems a lot of 
overhead put on each router having to process the additional messaging.

Kind Regards

Gyan

On Thu, Mar 9, 2023 at 4:04 PM Les Ginsberg (ginsberg) 
<ginsberg=40cisco....@dmarc.ietf.org<mailto:40cisco....@dmarc.ietf.org>> wrote:
(Changed the subject to differentiate from all the other “slot requests”)

+1 to what Robert has said.

We already have multiple ways to provide information to any entity that is 
interested – adding yet another transport doesn’t really help. Just burdens 
implementations with even more transports to support.
What we need is better usage of the information already available.

   Les

From: rtgwg <rtgwg-boun...@ietf.org<mailto:rtgwg-boun...@ietf.org>> On Behalf 
Of Robert Raszuk
Sent: Thursday, March 9, 2023 1:10 AM
To: tanzhen (A) <tanzh...@huawei.com<mailto:tanzh...@huawei.com>>
Cc: rtgwg@ietf.org<mailto:rtgwg@ietf.org>
Subject: Re: Slot Request for RTGWG IETF 116

Hi,

> O&M personal could actively request more information of other devices

But this is what every decently operating NOC is already doing today. They have 
bunch of tools external or home grown which go and query the network for 
specific information.

I am not clear why do we need a new protocol for this.

Note this is massive undertaking to keep track of what can be received across 
zoo of vendors and even within each vendor variants of operating systems.

Best,
R.










On Thu, Mar 9, 2023 at 3:38 AM tanzhen (A) 
<tanzh...@huawei.com<mailto:tanzh...@huawei.com>> wrote:
Hi Robert,
Please see my reply inline.

发件人: Robert Raszuk [mailto:rob...@raszuk.net<mailto:rob...@raszuk.net>]
发送时间: 2023年3月8日 21:38
收件人: tanzhen (A) <tanzh...@huawei.com<mailto:tanzh...@huawei.com>>
抄送: yingzhen.i...@gmail.com<mailto:yingzhen.i...@gmail.com>; 
rtgwg-cha...@ietf.org<mailto:rtgwg-cha...@ietf.org>; 
rtgwg@ietf.org<mailto:rtgwg@ietf.org>
主题: Re: Slot Request for RTGWG IETF 116


Hi,

I have two questions:

1.

Draft says that triggering the action is an event which in defined by 
configuration: "configured troubleshooting triggering condition."

If so that only covers very small subset of possible anomalies. Moreover this 
would only cover anomalies which are known apriori.

[T.Z.]: The initial purpose of this protocol is to obtain network-wide O&M 
information when a network failure occurs, which helps locate the failure. 
Besides the pre-configured condition, which triggers automatically, O&M 
personal could actively request more information of other devices, with PASP. 
More use-cases will be updated to the draft later this week.

2.

The draft focuses on control plane troubleshooting. Well most protocols have 
build in mechanisms for that. Instead most interesting failures are happening 
not in control plane but in data plane. So do you plan to refocus or add 
ability to self trigger actions based on various data plane events ?

[T.Z.]: Thank you for your advice. For now, we're focused on troubleshooting of 
control plane. The data plane scenario is a good direction for future expansion.

Many thx,
Robert







On Wed, Mar 8, 2023 at 7:25 AM tanzhen (A) 
<tanzhen6=40huawei....@dmarc.ietf.org<mailto:40huawei....@dmarc.ietf.org>> 
wrote:
Hi Yingzhen,

I would like to request a 10 minute slot for Protocol Assisted Protocol (PASP) 
draft:
https://datatracker.ietf.org/doc/draft-li-rtgwg-protocol-assisted-protocol/
An updated version of this draft will be uploaded later this week.
Presenter:  Zhen Tan
Thanks.
Zhen

_______________________________________________
rtgwg mailing list
rtgwg@ietf.org<mailto:rtgwg@ietf.org>
https://www.ietf.org/mailman/listinfo/rtgwg
_______________________________________________
rtgwg mailing list
rtgwg@ietf.org<mailto:rtgwg@ietf.org>
https://www.ietf.org/mailman/listinfo/rtgwg
--

[图像已被发件人删除。]<http://www.verizon.com/>

Gyan Mishra

Network Solutions Architect

Email gyan.s.mis...@verizon.com<mailto:gyan.s.mis...@verizon.com>

M 301 502-1347

_______________________________________________
rtgwg mailing list
rtgwg@ietf.org
https://www.ietf.org/mailman/listinfo/rtgwg

Reply via email to