Hi Paolo,

Thanks for your swift response! Here are the answers to your questions and some 
more precise information. A gzip of the syslog messages will be sent you 
privately:

- There were 26328 messages of the mentioned type during the high load period 
with 0.11.3 which was 43 minutes.
- No other errors were reported.
- The sources are two interfaces on one Cisco 6500, three interfaces on another 
6500 and two interfaces on a Juniper router. The former are full flows, while 
the Juniper is sampled 1:1000. The were losses on the reports from each 
interface on every router during this incident.

I will try the nfacct_disable_checks option, though I am reluctant as I expect 
to lose more flow information (I need to schedule it). What surprises me is 
that there were plenty of CPU-cycles left during this incident (at least on 
other threads/CPUs than the one nfacct used), only the load was slightly high. 
The Nfacctd host is a dual Xeon 3GHz and receives around 700-1500 Netflow/cFlow 
datagrams per second depending on the time of day. As flow data is heavily 
aggregated, there should (and seems to) be little problem dealing with this. 
Anybody got experince with how many datagrams it should be able to tackle? The 
path from the sources to the Netflow host is 10Gb except for the last jump 
(switch to host NIC) which is 1Gb. If we are on the edge of what can be dealt 
with, I have to revert to sampled flows on the Ciscos as well.


all the best,

-- Inge


> -----Original Message-----
> From: Paolo Lucente [mailto:[EMAIL PROTECTED] 
> Sent: 15. mars 2007 01:07
> To: Inge Bjørnvall Arnesen
> Cc: [email protected]
> Subject: Re: [pmacct-discussion] nfacctd warnings
> 
> Hi Inge,
> 25k messages in 45 mins makes some 9-10 messages per second - 
> which is quite a lot. Which network devices are you getting 
> NetFlow datagrams off? A reason I might see is: sequence 
> checks fail (ie. they have been reported to fail with 
> Huawei's implementation of NetFlow), grab a lot of CPU cycles 
> by logging down massively and as a result the box is unable 
> to process incoming datagrams at full rate - please note that 
> datagrams failing sequence checks are not discarded. 
> 
> To verify this, you can append this "nfacctd_disable_checks: 
> true" to your config. You should not see any further log 
> message at this propo and can compare whether graphs show the 
> expected figure.
> 
> If you still have such log, can you please send me privately 
> a more consistent fragment? I'm curious to look whether there 
> is any evident pattern. Sequence checks were not implemented 
> in 0.10.3 . Let me know how things work out. And thanks as 
> usual for your cooperation.
> 
> Cheers,
> Paolo
> 
> On Wed, Mar 14, 2007 at 05:11:50PM +0100, Inge Bj?rnvall 
> Arnesen wrote:
> > I know this is an oldie, but I'm very conservative when it 
> comes to upgrading. Here is my experience with this problem:
> > 
> > I've been running nfacctd - pmacct version 0.10.3 - for 
> quite some time. I use a memory plugin with interface to 
> Cricket for real time graph presentation and MySQL logging 
> for batch processing of the stored flows. >From time to time 
> I've been executing fairly complex MySQL queries (resulting 
> in high load on the Nfacct host - 2 to 4 - but lots of free 
> CPU time) while nfacct is running and this has been no 
> problem. Around 2.5 hours ago I upgraded to 0.11.3 and then 
> had to made some changes to some MySQL tables, resulting in 
> fairly high load (around 2.5, but still with a lot of CPU 
> left). The result was dramatic during the single hour I had 
> 0.11.3 running:
> > 
> > During the first 15 minutes (when the load was mostly low 
> as I just created some tables for later use) I received 4 
> messages like the ones below. After starting the MySQL jobs 
> and for the coming 45 minutes I had around 25000 messages, 
> all on the format:
> > 
> > Mar 14 16:16:23 dump02 nfacctd[9651]: WARN: expecting flow 
> > '3982342489' but received '3982343156' collector=(null):2100 
> > agent=193.156.90.68:1792 Mar 14 16:16:23 dump02 
> nfacctd[9651]: WARN: 
> > expecting flow '3982343156' but received '3982343533' 
> > collector=(null):2100 agent=193.156.90.68:1792
> > 
> > I have 3 distinct sources of Netflow/cFlow packets and all 
> three had "lost reports" like this. All plugins had a 
> dramatic decrease in reported flow data for all IPs (my 
> estimate is around 60% lost flow information during these 45 
> minutes). During that time I tried desperately to 
> troubleshoot the possible cause. Finally I gave up and 
> reverted to 0.10.3 (while the MySQL jobs were still running). 
> I received no further warning messages and the Cricket graphs 
> went immediately back to normal while the MySQL jobs 
> continued running with unaltered load (they are still running). 
> > 
> > There are 0 errors on the receiving interfaces. There were 
> no other recorded network related incidents during that 
> period. I also have another installation on a site with much 
> less traffic and more moderate load on the Nfaccd host and 
> there I have recorded one (1) such message with pmacct 0.11.3 
> (which I will happily write off as a lost UDP packet). There 
> are no interface errors on this host either.
> > 
> > Any ideas on this?
> > 
> > all the best,
> > 
> > -- Inge
> 

_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Reply via email to