Hi Adam/Mel,

Thanks for chiming in!

My understanding was that the tool will combine historic data with the MTBF 
datapoints form all components involved in a given link in order to try and 
estimate a likelihood of a link failure.

Yep. This could be one way indeed. This likelihood could also be taking the 
form of intervals in which you expect the true value to lies (again, based on 
historical data). This could be done both for link/devices failures but also 
for external inputs such as BGP announcements (to consider the likelihood that 
you receive a route for X in, say, NEWY). The tool would then to run the 
deterministic routing protocols (not accounting for ‘features’ such as 
prefer-oldest-route for a sec.) on these probabilistic inputs so as to infer 
the different possible forwarding outcomes and their relative probabilities. 
For now we had something like this in mind.

One can of course make the model more and more complex by e.g. also taking into 
account data plane status (to model gray failures). Intuitively though, the 
more complex the model, the more complex the inference process is.

Heck I imagine if one would stream a heap load of data at a ML algorithm it 
might draw some very interesting conclusions indeed -i.e. draw unforeseen 
patterns across huge datasets while trying to understand the overall system 
(network) behaviour. Such a tool might teach us something new about our 
networks.
Next level would be recommendations on how to best address some of the 
potential pitfalls it found.

Yes. I believe some variants of this exist already. I’m not sure how much they 
are used in practice though. AFAICT, false positives/negatives is still a big 
problem. Non-trivial recommendation system will require a model of the network 
behavior that can somehow be inverted easily which is probably something 
academics should spend some time on :-)

Maybe in closed systems like IP networks, with use of streaming telemetry from 
SFPs/NPUs/LC-CPUs/Protocols/etc.., we’ll be able to feed the analytics tool 
with enough data to allow it to make fairly accurate predictions (i.e. unlike 
in weather or markets prediction tools where the datasets (or search space -as 
not all attributes are equally relevant) is virtually endless).

I’m with you. I also believe that better (even programmable) telemetry will 
unlock powerful analysis tools.

Best,
Laurent


PS: Thanks a lot to those who have already answered our survey! For those who 
haven’t yet: https://goo.gl/forms/HdYNp3DkKkeEcexs2 (it only takes a couple of 
minutes).

Reply via email to