Hi Zhen I read the draft and I think the problem statement is not completely accurate as far the gap with existing methods of troubleshooting. Some feedback below.
Logging into router, existing telemetry tools, BMP, Netconf/yang, home grown tools, vendor tools provide a fairly comprehensive approach to troubleshooting that have been around for years. 1. BGP route oscillation. Vendor tools as well as CLI commands as well as route analytics tools on the market can easily get to root of route oscillations. 2. RSVP-TE setup failure Most vendor CLI show traffic engineering tunnel xyz shows exact reason for failure. As well misconfiguration, link failures on all paths can result in tunnel down or failover to FRR backup path 3. Peer disconnect The NOC would have access to all devices even if half way around the world. If between different admin domains more than likely PASP would have to be agreed upon which may not be possible. 1st step NMS telemetry would immediately detect the disconnect, checking logs traps and further diagnose by logging into router 4. Route interruptions More rare and hard to troubleshoot. This maybe timing based time of day and maybe even hard for PASP to detect. 5. No advertise This is usually due to a network change, recent misconfiguration, down node advertising prefix, can check logs for last configuration change to policy and restore rollback quickly to last known good config state 6. Route abnormal Route leak or route hijack can be detected by BMP and routing analytics Also there are in many cases the problem is not as simple to troubleshoot and especially intermittent problems and I think it would be very difficult or even possible to develop a protocol to troubleshoot those types of complex issues. This new transport protocol seems a lot of work to support and seems a lot of overhead put on each router having to process the additional messaging. Kind Regards Gyan On Thu, Mar 9, 2023 at 4:04 PM Les Ginsberg (ginsberg) <ginsberg= 40cisco....@dmarc.ietf.org> wrote: > (Changed the subject to differentiate from all the other “slot requests”) > > > > +1 to what Robert has said. > > > > We already have multiple ways to provide information to any entity that is > interested – adding yet another transport doesn’t really help. Just burdens > implementations with even more transports to support. > > What we need is better usage of the information already available. > > > > Les > > > > *From:* rtgwg <rtgwg-boun...@ietf.org> *On Behalf Of * Robert Raszuk > *Sent:* Thursday, March 9, 2023 1:10 AM > *To:* tanzhen (A) <tanzh...@huawei.com> > *Cc:* rtgwg@ietf.org > *Subject:* Re: Slot Request for RTGWG IETF 116 > > > > Hi, > > > > > O&M personal could actively request more information of other devices > > > > But this is what every decently operating NOC is already doing today. They > have bunch of tools external or home grown which go and query the > network for specific information. > > > > I am not clear why do we need a new protocol for this. > > > > Note this is massive undertaking to keep track of what can be received > across zoo of vendors and even within each vendor variants of operating > systems. > > > > Best, > > R. > > > > > > > > > > > > > > > > > > > > > > On Thu, Mar 9, 2023 at 3:38 AM tanzhen (A) <tanzh...@huawei.com> wrote: > > Hi Robert, > > Please see my reply inline. > > > > *发件人**:* Robert Raszuk [mailto:rob...@raszuk.net] > *发送时间:* 2023年3月8日 21:38 > *收件人:* tanzhen (A) <tanzh...@huawei.com> > *抄送:* yingzhen.i...@gmail.com; rtgwg-cha...@ietf.org; rtgwg@ietf.org > *主题:* Re: Slot Request for RTGWG IETF 116 > > > > > > Hi, > > > > I have two questions: > > > > 1. > > > > Draft says that triggering the action is an event which in defined by > configuration: "configured troubleshooting triggering condition." > > > > If so that only covers very small subset of possible anomalies. Moreover > this would only cover anomalies which are known apriori. > > > > [T.Z.]: The initial purpose of this protocol is to obtain network-wide O&M > information when a network failure occurs, which helps locate the failure. > Besides the pre-configured condition, which triggers automatically, O&M > personal could actively request more information of other devices, with > PASP. More use-cases will be updated to the draft later this week. > > > > 2. > > > > The draft focuses on control plane troubleshooting. Well most protocols > have build in mechanisms for that. Instead most interesting failures are > happening not in control plane but in data plane. So do you plan to refocus > or add ability to self trigger actions based on various data plane events ? > > > > [T.Z.]: Thank you for your advice. For now, we're focused on > troubleshooting of control plane. The data plane scenario is a good > direction for future expansion. > > > > Many thx, > > Robert > > > > > > > > > > > > > > > > On Wed, Mar 8, 2023 at 7:25 AM tanzhen (A) <tanzhen6= > 40huawei....@dmarc.ietf.org> wrote: > > Hi Yingzhen, > > > I would like to request a 10 minute slot for Protocol Assisted Protocol > (PASP) draft: > https://datatracker.ietf.org/doc/draft-li-rtgwg-protocol-assisted-protocol/ An > updated version of this draft will be uploaded later this week. Presenter: > Zhen Tan > > Thanks. > > Zhen > > > > _______________________________________________ > rtgwg mailing list > rtgwg@ietf.org > https://www.ietf.org/mailman/listinfo/rtgwg > > _______________________________________________ > rtgwg mailing list > rtgwg@ietf.org > https://www.ietf.org/mailman/listinfo/rtgwg > -- <http://www.verizon.com/> *Gyan Mishra* *Network Solutions A**rchitect * *Email gyan.s.mis...@verizon.com <gyan.s.mis...@verizon.com>* *M 301 502-1347*
_______________________________________________ rtgwg mailing list rtgwg@ietf.org https://www.ietf.org/mailman/listinfo/rtgwg