Hi Edward, First, thanks for this exaustive email, very interesting. My first question to scope it better is whether you are using any sampling rate, and if yes how much. I ask because i'd intuitively say if a flow is created from a single sampled packet (which gets typical on most traffic, not all ie. long-lived video, on big sampling rates ie. 1:10k) then one cannot/shouldn't pro-rate.
Another part to it is that a generic-enough pro-rating algorithm should really work across multiple time-bins in the past. Now that print plugin supports appending to existing files (1.5.0rc1) and determine which file to write to depending on timestamp (if newly introduced file_history is specified) i could code something around it and we can pilot it to see whether results are according to expectation. Essentially, you find me positive about your point. I get back soon in touch with you privately about this - does it sound like a good way forward? If anybody else is interested into this and would like to give it a try, just let me know. Cheers, Paolo On Fri, Aug 30, 2013 at 11:26:08AM -0500, Edward Henigin wrote: > Hello Paolo, > > I'm currently using nfacctd to capture netflow accounting, for the purpose > of identifying unexpected high traffic flows. (I happen to be using the > print plugin, and parsing the text files to generate web reports, etc.) The > netflow exporter in this case is a Cisco RSP720. > > One thing I notice is that the accounting data ends up being "bursty." It > makes sense to me that this is a natural result of the netflow accounting > architecture on the RSP720, with the fact that flows can be expired at any > time. In the beginning of a 300-second window, flows may be expired and > exported which primarily covered the preceding 300-second period, and at > the end of the current window, all active flows may suddenly be expired and > exported, causing a "spike" in reported traffic. This actually seems to > happen quite a bit, here's a random sample of total packets/sec and > bits/sec for a network segment where the traffic levels are actually > relatively stable: > > Time Ending Total Kpps Total Mbps 8/30/2013 11:01:33 941 6736 8/30/2013 > 11:00:29 415 2941 8/30/2013 10:59:25 1115 7865 8/30/2013 10:58:21 1229 > 9193 8/30/2013 10:57:17 127 420 8/30/2013 10:56:13 1313 9412 8/30/2013 > 10:55:09 946 6934 > > (NB the above is using 64-second mls aging and print refresh time, but the > concept stands regardless of interval length) > > So the reason I'm writing is because in a previous life, I used a different > netflow collector which simply dumped the netflow records to a flat file, > and I wrote the scripts to aggregate the data. I saw the same burstiness in > traffic rates due to the nature of netflow. At that time, I employed a > strategy which seemed to do a very good job of smoothing out the > burstiness. What I did was to pro-rate the byte & packet counts across time > intervals. > > So for example, if we receive a netflow accounting record, duration 240 > seconds, at 00:06:00, then I would count 1/4 of the packets & bytes to the > current interval (05:00 - 09:59) and 3/4 of the packets & bytes to the > prior interval (00:00 - 04:59). > > A downside is that you only get "half" of the data for the current > interval, so full reporting for any given interval is delayed by 1x > interval length. > > I'm interested in applying the pro-rating algorithm to nfacctd. I have no > idea how I would do that in the code. > > Paolo, I'm curious your thoughts in this regard. > > Thanks, > > Ed > _______________________________________________ > pmacct-discussion mailing list > http://www.pmacct.net/#mailinglists _______________________________________________ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
