Re: Comments on draft-li-rtgwg-protocol-assisted-protocol

Gyan Mishra Sat, 01 Apr 2023 22:33:57 -0700

Hi Zhen

I read the draft and I think the problem statement is not completely
accurate  as far the gap with existing methods of troubleshooting. Some
feedback below.

Logging into router, existing telemetry tools, BMP, Netconf/yang, home
grown tools, vendor tools provide a fairly comprehensive approach to
troubleshooting that have been around for years.

1. BGP route oscillation.
Vendor tools as well as CLI commands as well as route analytics tools on
the market can easily get to root of route oscillations.

2. RSVP-TE setup failure
Most vendor CLI show traffic engineering tunnel xyz shows exact reason for
failure.  As well misconfiguration, link failures on all paths can result
in tunnel down or failover to FRR backup path

3. Peer disconnect
The NOC would have access to all devices even if half way around the world.
If between different admin domains more than likely PASP would have to be
agreed upon which may not be possible.
1st step NMS telemetry would immediately detect the disconnect, checking
logs traps and further diagnose by logging into router

4. Route interruptions
More rare and hard to troubleshoot.  This maybe timing based time of day
and maybe even hard for PASP to detect.

5. No advertise
This is usually due to a network change, recent misconfiguration, down node
advertising prefix, can check logs for last configuration change to policy
and restore rollback quickly to last known good  config state

6. Route abnormal
Route leak or route hijack can be detected by BMP and routing analytics

Also there are in many cases the problem is not as simple to troubleshoot
and especially intermittent problems and I think it would be very difficult
or even possible to develop a protocol to troubleshoot those types of
complex issues.

This new transport protocol seems a lot of work to support and seems a lot
of overhead put on each router having to process the additional messaging.

Kind Regards

Gyan

On Thu, Mar 9, 2023 at 4:04 PM Les Ginsberg (ginsberg) <ginsberg=
40cisco....@dmarc.ietf.org> wrote:

> (Changed the subject to differentiate from all the other “slot requests”)
>
>
>
> +1 to what Robert has said.
>
>
>
> We already have multiple ways to provide information to any entity that is
> interested – adding yet another transport doesn’t really help. Just burdens
> implementations with even more transports to support.
>
> What we need is better usage of the information already available.
>
>
>
>    Les
>
>
>
> *From:* rtgwg <rtgwg-boun...@ietf.org> *On Behalf Of * Robert Raszuk
> *Sent:* Thursday, March 9, 2023 1:10 AM
> *To:* tanzhen (A) <tanzh...@huawei.com>
> *Cc:* rtgwg@ietf.org
> *Subject:* Re: Slot Request for RTGWG IETF 116
>
>
>
> Hi,
>
>
>
> > O&M personal could actively request more information of other devices
>
>
>
> But this is what every decently operating NOC is already doing today. They
> have bunch of tools external or home grown which go and query the
> network for specific information.
>
>
>
> I am not clear why do we need a new protocol for this.
>
>
>
> Note this is massive undertaking to keep track of what can be received
> across zoo of vendors and even within each vendor variants of operating
> systems.
>
>
>
> Best,
>
> R.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Thu, Mar 9, 2023 at 3:38 AM tanzhen (A) <tanzh...@huawei.com> wrote:
>
> Hi Robert,
>
> Please see my reply inline.
>
>
>
> *发件人**:* Robert Raszuk [mailto:rob...@raszuk.net]
> *发送时间:* 2023年3月8日 21:38
> *收件人:* tanzhen (A) <tanzh...@huawei.com>
> *抄送:* yingzhen.i...@gmail.com; rtgwg-cha...@ietf.org; rtgwg@ietf.org
> *主题:* Re: Slot Request for RTGWG IETF 116
>
>
>
>
>
> Hi,
>
>
>
> I have two questions:
>
>
>
> 1.
>
>
>
> Draft says that triggering the action is an event which in defined by
> configuration: "configured troubleshooting triggering condition."
>
>
>
> If so that only covers very small subset of possible anomalies. Moreover
> this would only cover anomalies which are known apriori.
>
>
>
> [T.Z.]: The initial purpose of this protocol is to obtain network-wide O&M
> information when a network failure occurs, which helps locate the failure.
> Besides the pre-configured condition, which triggers automatically, O&M
> personal could actively request more information of other devices, with
> PASP. More use-cases will be updated to the draft later this week.
>
>
>
> 2.
>
>
>
> The draft focuses on control plane troubleshooting. Well most protocols
> have build in mechanisms for that. Instead most interesting failures are
> happening not in control plane but in data plane. So do you plan to refocus
> or add ability to self trigger actions based on various data plane events ?
>
>
>
> [T.Z.]: Thank you for your advice. For now, we're focused on
> troubleshooting of control plane. The data plane scenario is a good
> direction for future expansion.
>
>
>
> Many thx,
>
> Robert
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Wed, Mar 8, 2023 at 7:25 AM tanzhen (A) <tanzhen6=
> 40huawei....@dmarc.ietf.org> wrote:
>
> Hi Yingzhen,
>
>
> I would like to request a 10 minute slot for Protocol Assisted Protocol
> (PASP) draft：
> https://datatracker.ietf.org/doc/draft-li-rtgwg-protocol-assisted-protocol/ An
> updated version of this draft will be uploaded later this week. Presenter:
> Zhen Tan
>
> Thanks.
>
> Zhen
>
>
>
> _______________________________________________
> rtgwg mailing list
> rtgwg@ietf.org
> https://www.ietf.org/mailman/listinfo/rtgwg
>
> _______________________________________________
> rtgwg mailing list
> rtgwg@ietf.org
> https://www.ietf.org/mailman/listinfo/rtgwg
>
-- 

<http://www.verizon.com/>

*Gyan Mishra*

*Network Solutions A**rchitect *

*Email gyan.s.mis...@verizon.com <gyan.s.mis...@verizon.com>*

*M 301 502-1347*

_______________________________________________
rtgwg mailing list
rtgwg@ietf.org
https://www.ietf.org/mailman/listinfo/rtgwg

Re: Comments on draft-li-rtgwg-protocol-assisted-protocol

Reply via email to