Hi Gyan Thank you for your detailed analysis.
I agree with you that there are already approaches to troubleshoot for the use cases, which are mentioned in the draft. However, there are some disadvantage in the existing mechanism, from a maintenance personnel's point of view. 1. Different venders may have their own mechanism for troubleshooting. A maintenance person could spend a lot of time to get used to these mechanisms. Also, for a method that involves inter-device cooperation, such as detecting the root of route oscillations, devices from different venders are very likely not be able to cooperate. 2. For routing protocols that have built-in mechanisms for troubleshooting, such as RSVP-TE setup failure, there are some scalability issues. If further trouble-shooting mechanisms were introduced, an extension would be necessary. If multiple routing protocols are facing a common problem, each of them needs to be extended. There could be a lot of duplicated work. 3. For issues like No-Advertise, you can certainly check the configuration logs to locate the problem. But this may requires maintenance personnel to check all devices one-by-one. With an east-west methods, we could just get those information on one device. This is actually one of the original intention for PASP. So, we're not trying to re-invent existing troubleshooting methods. PASP can work with existing mechanisms to achieve better results. Best. Zhen From: Gyan Mishra [mailto:hayabusa...@gmail.com] Sent: Sunday, April 2, 2023 1:06 PM To: Les Ginsberg (ginsberg) <ginsberg=40cisco....@dmarc.ietf.org> Cc: Robert Raszuk <rob...@raszuk.net>; rtgwg@ietf.org; tanzhen (A) <tanzh...@huawei.com> Subject: Re: Comments on draft-li-rtgwg-protocol-assisted-protocol Hi Zhen I read the draft and I think the problem statement is not completely accurate as far the gap with existing methods of troubleshooting. Some feedback below. Logging into router, existing telemetry tools, BMP, Netconf/yang, home grown tools, vendor tools provide a fairly comprehensive approach to troubleshooting that have been around for years. 1. BGP route oscillation. Vendor tools as well as CLI commands as well as route analytics tools on the market can easily get to root of route oscillations. 2. RSVP-TE setup failure Most vendor CLI show traffic engineering tunnel xyz shows exact reason for failure. As well misconfiguration, link failures on all paths can result in tunnel down or failover to FRR backup path 3. Peer disconnect The NOC would have access to all devices even if half way around the world. If between different admin domains more than likely PASP would have to be agreed upon which may not be possible. 1st step NMS telemetry would immediately detect the disconnect, checking logs traps and further diagnose by logging into router 4. Route interruptions More rare and hard to troubleshoot. This maybe timing based time of day and maybe even hard for PASP to detect. 5. No advertise This is usually due to a network change, recent misconfiguration, down node advertising prefix, can check logs for last configuration change to policy and restore rollback quickly to last known good config state 6. Route abnormal Route leak or route hijack can be detected by BMP and routing analytics Also there are in many cases the problem is not as simple to troubleshoot and especially intermittent problems and I think it would be very difficult or even possible to develop a protocol to troubleshoot those types of complex issues. This new transport protocol seems a lot of work to support and seems a lot of overhead put on each router having to process the additional messaging. Kind Regards Gyan On Thu, Mar 9, 2023 at 4:04 PM Les Ginsberg (ginsberg) <ginsberg=40cisco....@dmarc.ietf.org<mailto:40cisco....@dmarc.ietf.org>> wrote: (Changed the subject to differentiate from all the other “slot requests”) +1 to what Robert has said. We already have multiple ways to provide information to any entity that is interested – adding yet another transport doesn’t really help. Just burdens implementations with even more transports to support. What we need is better usage of the information already available. Les From: rtgwg <rtgwg-boun...@ietf.org<mailto:rtgwg-boun...@ietf.org>> On Behalf Of Robert Raszuk Sent: Thursday, March 9, 2023 1:10 AM To: tanzhen (A) <tanzh...@huawei.com<mailto:tanzh...@huawei.com>> Cc: rtgwg@ietf.org<mailto:rtgwg@ietf.org> Subject: Re: Slot Request for RTGWG IETF 116 Hi, > O&M personal could actively request more information of other devices But this is what every decently operating NOC is already doing today. They have bunch of tools external or home grown which go and query the network for specific information. I am not clear why do we need a new protocol for this. Note this is massive undertaking to keep track of what can be received across zoo of vendors and even within each vendor variants of operating systems. Best, R. On Thu, Mar 9, 2023 at 3:38 AM tanzhen (A) <tanzh...@huawei.com<mailto:tanzh...@huawei.com>> wrote: Hi Robert, Please see my reply inline. 发件人: Robert Raszuk [mailto:rob...@raszuk.net<mailto:rob...@raszuk.net>] 发送时间: 2023年3月8日 21:38 收件人: tanzhen (A) <tanzh...@huawei.com<mailto:tanzh...@huawei.com>> 抄送: yingzhen.i...@gmail.com<mailto:yingzhen.i...@gmail.com>; rtgwg-cha...@ietf.org<mailto:rtgwg-cha...@ietf.org>; rtgwg@ietf.org<mailto:rtgwg@ietf.org> 主题: Re: Slot Request for RTGWG IETF 116 Hi, I have two questions: 1. Draft says that triggering the action is an event which in defined by configuration: "configured troubleshooting triggering condition." If so that only covers very small subset of possible anomalies. Moreover this would only cover anomalies which are known apriori. [T.Z.]: The initial purpose of this protocol is to obtain network-wide O&M information when a network failure occurs, which helps locate the failure. Besides the pre-configured condition, which triggers automatically, O&M personal could actively request more information of other devices, with PASP. More use-cases will be updated to the draft later this week. 2. The draft focuses on control plane troubleshooting. Well most protocols have build in mechanisms for that. Instead most interesting failures are happening not in control plane but in data plane. So do you plan to refocus or add ability to self trigger actions based on various data plane events ? [T.Z.]: Thank you for your advice. For now, we're focused on troubleshooting of control plane. The data plane scenario is a good direction for future expansion. Many thx, Robert On Wed, Mar 8, 2023 at 7:25 AM tanzhen (A) <tanzhen6=40huawei....@dmarc.ietf.org<mailto:40huawei....@dmarc.ietf.org>> wrote: Hi Yingzhen, I would like to request a 10 minute slot for Protocol Assisted Protocol (PASP) draft: https://datatracker.ietf.org/doc/draft-li-rtgwg-protocol-assisted-protocol/ An updated version of this draft will be uploaded later this week. Presenter: Zhen Tan Thanks. Zhen _______________________________________________ rtgwg mailing list rtgwg@ietf.org<mailto:rtgwg@ietf.org> https://www.ietf.org/mailman/listinfo/rtgwg _______________________________________________ rtgwg mailing list rtgwg@ietf.org<mailto:rtgwg@ietf.org> https://www.ietf.org/mailman/listinfo/rtgwg -- [图像已被发件人删除。]<http://www.verizon.com/> Gyan Mishra Network Solutions Architect Email gyan.s.mis...@verizon.com<mailto:gyan.s.mis...@verizon.com> M 301 502-1347
_______________________________________________ rtgwg mailing list rtgwg@ietf.org https://www.ietf.org/mailman/listinfo/rtgwg