Hi Paolo,
On Sat, 14 Mar 2009, Paolo Lucente wrote:
> About the SQL INSERT conflict, are you by any chance making use of the
> "sql_dont_try_update" directive in your configuration?
Yes I am, because it's much more efficient.
> And are you using 32bit counters?
I think so, yes. I compiled with default options on a 32-bit host.
> The conjunction of these two conditions might explain.
>
> The SQL cache code, while summing up counters, makes a check on whether
> the counter field is about to overflow. When 64bit counters are disabled
> (default) this is what happens:
>
> #define UINT32T_THRESHOLD 4290000000UL
> #define CACHE_THRESHOLD UINT32T_THRESHOLD
>
> /* additional check: bytes counter overflow */
> else if (Cursor->bytes_counter > CACHE_THRESHOLD) {
> if (!staleElem && Cursor->chained) staleElem = Cursor;
> goto follow_chain;
> }
>
> Basically, a new record for the entry which is going to overflow is
> opened and the old one if "parked". When purging the cache to the SQL
> database, both records (the active and the parked one) are sent over;
> the first with an INSERT the second with an UPDATE. This mechanism is
> valid for any number of overflows - indeed.
>
> The above would also explain why a number of the entries above the 1GB
> level are around the 4GB. But this also would suggest the counters are
> genuine. Another thing which would suggest these are "real" is that by
> dividing the bytes counter by the packets counter, you get a consistent
> average data size:
>
> 4290000028 / 10026264 = ~428 bytes
> 3943258731 / 8984686 = ~439 bytes
>
> Any bytes counter roll-over would have greatly skewed one of the above
> two proportions - highlighting an issue. But this would suggest that in
> a single minute roughly 8GB of data were transferred. This translates in
> a fully loaded 1Gbps link. This brings me to these questions: is your LAN
> network (including the "192.168.0.175" host) connected to 1Gbps? Do you
> think it could be possible some LAN traffic gets spanned over?
The local machine is connected to a gigabit switch on the LAN, but this
host is attached to another switch which is not gigabit, so that suggests
to me that the counter is invalid. I just checked on the switch, and the
port that this machine is attached to is currently running at 100mbps.
It is possible that either the switch or my firewall/router/pmacct box is
going mental and repeating traffic.
Perhaps the best thing to do is to recompile pmacct with 64-bit counters
to see if the issue goes away? Alternatively I planned to log all traffic
with tcpdump -w to create a pcap file that I could replay into pmacctd to
reproduce the problem if it happens again. Would that work? Does pmacctd
honour the timestamps in the pcap file while reading it?
Cheers, Chris.
--
Aptivate | http://www.aptivate.org | Phone: +44 1223 760887
The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES
Aptivate is a not-for-profit company registered in England and Wales
with company number 04980791.
_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists