Hi Inge,
25k messages in 45 mins makes some 9-10 messages per second - which is
quite a lot. Which network devices are you getting NetFlow datagrams
off? A reason I might see is: sequence checks fail (ie. they have been
reported to fail with Huawei's implementation of NetFlow), grab a lot
of CPU cycles by logging down massively and as a result the box is
unable to process incoming datagrams at full rate - please note that
datagrams failing sequence checks are not discarded. 

To verify this, you can append this "nfacctd_disable_checks: true" to
your config. You should not see any further log message at this propo
and can compare whether graphs show the expected figure.

If you still have such log, can you please send me privately a more
consistent fragment? I'm curious to look whether there is any evident
pattern. Sequence checks were not implemented in 0.10.3 . Let me know
how things work out. And thanks as usual for your cooperation.

Cheers,
Paolo

On Wed, Mar 14, 2007 at 05:11:50PM +0100, Inge Bj?rnvall Arnesen wrote:
> I know this is an oldie, but I'm very conservative when it comes to 
> upgrading. Here is my experience with this problem:
> 
> I've been running nfacctd - pmacct version 0.10.3 - for quite some time. I 
> use a memory plugin with interface to Cricket for real time graph 
> presentation and MySQL logging for batch processing of the stored flows. 
> >From time to time I've been executing fairly complex MySQL queries 
> (resulting in high load on the Nfacct host - 2 to 4 - but lots of free CPU 
> time) while nfacct is running and this has been no problem. Around 2.5 hours 
> ago I upgraded to 0.11.3 and then had to made some changes to some MySQL 
> tables, resulting in fairly high load (around 2.5, but still with a lot of 
> CPU left). The result was dramatic during the single hour I had 0.11.3 
> running:
> 
> During the first 15 minutes (when the load was mostly low as I just created 
> some tables for later use) I received 4 messages like the ones below. After 
> starting the MySQL jobs and for the coming 45 minutes I had around 25000 
> messages, all on the format:
> 
> Mar 14 16:16:23 dump02 nfacctd[9651]: WARN: expecting flow '3982342489' but 
> received '3982343156' collector=(null):2100 agent=193.156.90.68:1792
> Mar 14 16:16:23 dump02 nfacctd[9651]: WARN: expecting flow '3982343156' but 
> received '3982343533' collector=(null):2100 agent=193.156.90.68:1792
> 
> I have 3 distinct sources of Netflow/cFlow packets and all three had "lost 
> reports" like this. All plugins had a dramatic decrease in reported flow data 
> for all IPs (my estimate is around 60% lost flow information during these 45 
> minutes). During that time I tried desperately to troubleshoot the possible 
> cause. Finally I gave up and reverted to 0.10.3 (while the MySQL jobs were 
> still running). I received no further warning messages and the Cricket graphs 
> went immediately back to normal while the MySQL jobs continued running with 
> unaltered load (they are still running). 
> 
> There are 0 errors on the receiving interfaces. There were no other recorded 
> network related incidents during that period. I also have another 
> installation on a site with much less traffic and more moderate load on the 
> Nfaccd host and there I have recorded one (1) such message with pmacct 0.11.3 
> (which I will happily write off as a lost UDP packet). There are no interface 
> errors on this host either.
> 
> Any ideas on this?
> 
> all the best,
> 
> -- Inge

_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Reply via email to