On 2 Sep 2015, at 6:06, Jared Mauch wrote:
You are, Avi has said that the number of people with a network is
outnumbered about 50:1 using his most favorable numbers.
Again, to clarify - I count VLANs/VRFs as being sufficiently out-of-band
to handle flow telemetry on a reasonable basis without mixing it in with
customer traffic.
That changes the ratio.
This means for your one example there are 50 people not doing this and
the world hasn’t ended for them. If you aren’t listening to Avi,
please trust me, you don’t need your own OOB network for flow, nor
is putting your flow there going to provide you some magical value.
I agree with you, Avi, and others that a dedicated OOB network *just for
flow telemetry* doesn't make economic sense in most (any?) scenarios.
What I'm saying is that it oughtn't to be mixed in with customer
data-plane traffic. Ideally, all management-plane traffic would
traverse a separate physical infrastructure. Since we don't live in an
ideal world, virtual separation is generally Good Enough.
1:10k sampling works and you don’t need much more than that unless
you’re at extremely low bitrates. Most attacks last under 1 hour
and even the small ones shout out in netflow data doing a simple hash
sort algorithm with the proper keys
Concur 100%. I spend a lot of time explaining to customers that no,
they don't need/want 1:1 even if they could get it, and that the 'wake'
left by attack traffic stands out very well even at relatively high
sampling ratios.
Most of the network-oriented folks seem to grasp this pretty quickly.
It's generally the 'security' types who often seem
conceptually/attitudinally incapable of understanding these principles.
. You can even use QoS to mitigate if your goal is attack
traffic as they’re mostly UDP based attacks, see:
https://tools.ietf.org/html/draft-byrne-opsec-udp-advisory-00 for some
advice/input.
I know you do this, and I understand why. Not everyone agrees with this
and does it, and I also understand why (not).
ntp is easy, because there's the timesync packet-size classification
hook. It gets a little dicier with other things.
I’ve shared my own input at recent NANOG meetings, including
policers to keep the attacks under control.
And it's valuable experience to share, nobody disputes that.
I’m not talking about datacenter class equipment that you seem so
focused on like the Earl7 with the TICO etc that did software sampling
out of the hardware tcam and would be overrun.
I'm pretty sure the CRSes I referred to with the linecard-reboot issue
in my example aren't datacenter-class equipment.
;>
What people often don’t see is true “scale”[1] of netflow. When
you have enough attributes or want to actually look at your IPv6 there
have been significant shortcomings. We had to remind the patent
holder for netflow how to implement it for their own hardware.
This is very true. IPv6 flow telemetry is another area in which
IPv4/IPv6 feature parity lags. Because of your focus on large-scale
IPv6 deployment over the course of many years, you see and experience a
lot more IPv6-related deficiencies than most folks.
aside: will you be in Yokohama? We should get lunch/dinner.
Yes, and yes.
;>
[1] - I hate this word, vendors use it as an excuse to hardcode limits
and to not properly respond to valid use cases
Concur 100%.
Another annoying vendor trait is use-case obsession. In many contexts,
the right answer is to understand that there is a baseline plateau of
vitally necessary scaling (that word, again) capacity and required
functionality which is universally applicable, irrespective of
variations in particular use cases.
I recently had a discussion with someone who was asking me how many
attack sources one typically sees in a given DDoS attack. My response
was that there is no 'typical'; and that for IPv4, the theoretical
potential is 2^32 sources, while in IPv6, the theoretical potential is
for 2^128 sources.
It was a light-bulb moment.
;>
-----------------------------------
Roland Dobbins <rdobb...@arbor.net>