Hi Jason,

comparing SNMP counters and NetFlow data can be tricky. This is mainly
because SNMP counters are updated real-time while NetFlow by the way it
has been conceived involves sort of "buffering": the first packet seen
in a flow creates a flow structure; then the NetFlow agent accumulates
counters and other stuff (ie. TCP flags); finally it kicks the flow to
the collector, either because it reckons it's closed (ie. TCP RST seen)
or because of inactivity timers (that vary from protocol to protocol),
or because it's a long lived flow (so you have another timer for that),
or because for example the NetFlow agent running out of resources (and
this also adds a touch of fun).

Then, once things reach the collector are further buffered: pmacct can
do this in two stages for optimizing resources and cope with sustained
traffic rates: a) when a flow is handed off/distributed from the core
plugin to backend (memory, xSQL) plugins via the 'plugin_buffer_size'
directive and b) in case an xSQL plugin is used, when a flow is cached
inside the plugin waiting to be sent to the database; this is tuned via
the 'sql_refresh_time' directive.

Comparing accuracy can get even trickier when enabling the 'sql_history'
feature, ie. to chop in bins of 5 minutes the traffic per IP address, if
any of the eviction timers at the NetFlow agent is larger than the SQL
history timeframe (in production this is not an issue but getting a clue
about accuracy is a different call). 

Summarizing:

* you can limit the SNMP vs NetFlow impact of the collector by storing
  collected data into a memory table
* if testing in lab without a huge number of concurrent flows, then
  you can disable buffering by omitting the 'plugin_buffer_size' (by
  default pmacct doesn't buffer)
* make sure the collector doesn't loose any NetFlow datagrams, ie.
  run pmacct in foreground (or in background by logging somewhere)
  and watch out for any suspicious message
* reduce at some bare minimum all the timers at the NetFlow agent
* take into account that SNMP counters might very possibly reason in
  terms of frames instead of IP packets. The required math has to be
  applied in this case

What surprises me a little bit is the NetFlow counters greater than
the SNMP counters by a factor of 2. Any chance such flows have been
seen by, say, two agents and thus reported twice to the collector?

Cheers,
Paolo

On Mon, Sep 22, 2008 at 11:48:33AM -0700, Jason Chambers wrote:
> Hello all,
> 
> Great tool, very useful.
> 
> I'm trying to understand the total bytes per time bin collected by nfacctd.
> 
> The problem I have is a calculated bandwidth per time bin (5 minute
> intervals) does not match the calculated bandwidth from SNMP byte counters.
> 
> On one link, NFacct data is always more than what SNMP data reports.  In
> some cases it is by a factor of 2.  On another link it is usually below
> the average bandwidth however there are some instances of the described.
> 
> If anything, I would expect the calculated bandwidth to always be less
> than what SNMP reports since I am limiting NFacct collection to a minb
> value.
> 
> I suspect maybe this is something to do with the accounting time window
> within nfacct and the netflow timers.  I'm looking through the source
> code for clues, but maybe someone has seen this before or can point out
> my mistake ?
> 
> 
> Regards,
> 
> --Jason

_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Reply via email to