Hello everybody,

For first i would like to thank Paolo for very nice piece of software :)

I've got some strange problem, nfaccounting daemon is for some reason locking
my database, causing high load-average ( 1.2 and up for last 5,10,15 
minutes ). 
It's stops when i kill all the nfacctd daemons ( by killall -9 ). I think it's 
related to cache purge ( I came to this conclusion after greping & reading 
source code ).

biuro:~# nfacctd -h 2>&1 | head -1
NetFlow Accounting Daemon, nfacctd 0.9.1
( package built from Debian unstable tree )
Earlier i had the same problem with nfacctd version from Debina stable 
( pmacct 0.7.x ). The ChangeLog doesn't say that in newer version was 
something fixed, so i didn't upgrade ( but I will try to upgrade tomorrow to 
0.9.4 or even 0.9.5 ).

I'm collecting traffic for almost ~2000 home-users ( the goal is to build 
stats for >4000 homeusers ), the database backend is Postgres 7.4 ( from 
Debian stable ). The routers are Linux-based with fprobe running on them.

The nfacctd daemon can run up to 1.5 day normal ( very low CPU used by the 
database ) before it runs "LOCK TABLE", then the DB starts to take CPU.

Earlier i've also tested nfacctd with MySQL 4.1 ( MyISAM and InnoDB later ), 
but switched to Postgres - results are the same. The machine is 1 x Xeon 
3.6GHz with HT-enabled and 1GB RAM ( i'm going to upgrade it to 2 x Xeon 
3.6GHz + 2GB RAM in max 2 weeks, the machine is going to do several other 
things some day ). Postgres has fsync option disabled due to performacne 
reasons.

Snapshot of "ps uaxw" for "bad" situation:
biuro:~# ps uaxww | grep nfacct
root     14252  0.0  1.0 16620 10560 ?       Ss   Nov19   
0:35 /usr/sbin/nfacctd -f /etc/pmacct/nfacctd.conf
root     31300  0.0  1.4 20484 14792 ?       S    Nov19   
0:14 /usr/sbin/nfacctd -f /etc/pmacct/nfacctd.conf
root     17203  0.0  1.3 20092 14328 ?       S    17:45   
0:00 /usr/sbin/nfacctd -f /etc/pmacct/nfacctd.conf
root     23615  0.0  1.4 20348 14576 ?       S    18:15   
0:00 /usr/sbin/nfacctd -f /etc/pmacct/nfacctd.conf
root     22347  0.0  1.4 20348 14576 ?       S    18:45   
0:00 /usr/sbin/nfacctd -f /etc/pmacct/nfacctd.conf
root     31530  0.0  1.4 20480 14788 ?       S    19:15   
0:00 /usr/sbin/nfacctd -f /etc/pmacct/nfacctd.conf
root     31782  0.0  1.4 20480 14788 ?       S    19:45   
0:00 /usr/sbin/nfacctd -f /etc/pmacct/nfacctd.conf
root      6019  0.0  1.0 16628 10560 ?       Ss   19:59   
0:00 /usr/sbin/nfacctd -f /etc/pmacct/nfacctd.conf
root     21837  0.0  1.3 19312 13456 ?       S    19:59   
0:00 /usr/sbin/nfacctd -f /etc/pmacct/nfacctd.conf
root     24685  0.0  1.0 16276 10360 ?       S    19:59   
0:00 /usr/sbin/nfacctd -f /etc/pmacct/nfacctd.conf
root      3834  0.0  1.2 18652 12952 ?       S    20:15   
0:00 /usr/sbin/nfacctd -f /etc/pmacct/nfacctd.conf

Normally there are only 3-4 nfacctd processes when it works the right way.

biuro:~# ps uaxww | grep postgres
postgres 14955  0.0  0.3 23516 3968 ?        S    Nov19   
0:00 /usr/lib/postgresql/bin/postmaster -D /var/lib/postgres/data
postgres  1516  0.0  0.4 14316 4728 ?        S    Nov19   0:10 postgres: stats 
buffer process
postgres 23577  0.0  0.3 13456 3852 ?        S    Nov19   0:07 postgres: stats 
collector process
postgres 19134 11.2  1.3 24332 14344 ?       R    17:45  18:55 postgres: 
pmacct netflow [local] UPDATE
postgres  8800  0.0  0.5 24196 5832 ?        S    18:15   0:00 postgres: 
pmacct netflow [local] LOCK TABLE waiting
postgres  7435  0.0  0.5 24196 5656 ?        S    18:45   0:00 postgres: 
pmacct netflow [local] LOCK TABLE waiting
postgres 20137  0.0  0.5 24196 5692 ?        S    19:15   0:00 postgres: 
pmacct netflow [local] LOCK TABLE waiting
postgres  6593  0.0  0.5 24196 5824 ?        S    19:45   0:00 postgres: 
pmacct netflow [local] LOCK TABLE waiting
postgres 14043  0.0  0.5 24196 5664 ?        S    19:59   0:00 postgres: 
pmacct netflow [local] LOCK TABLE waiting
postgres 27674  0.0  0.5 24196 5840 ?        S    20:15   0:00 postgres: 
pmacct netflow [local] LOCK TABLE waiting

pmacct-0.9.1 was self-compiled using:
gcc version 3.3.5 (Debian 1:3.3.5-13) 
glibc 2.3.2.ds1-22


Config file:
------------
daemonize: true
debug: false
pidfile: /var/run/nfacctd.pid
syslog: daemon
aggregate[in]: dst_host
aggregate[out]: src_host

! FIXME!!!
aggregate_filter[in]: dst net 192.168.0.0/16
aggregate_filter[out]: src net 192.168.0.0/16

plugin_pipe_size: 4096000
plugin_buffer_size: 4096

plugins: pgsql[in], pgsql[out]

sql_table[in]: acct_in
sql_table[out]: acct_out
sql_data[in]: typed
sql_data[out]: typed
sql_data: typed
!sql_table_schema[in]: /etc/pmacct/create-table_in.sql
!sql_table_schema[out]: /etc/pmacct/create-table_out.sql

!sql_multi_values: 512000

sql_db: netflow
!sql_host: localhost
sql_passwd: password
sql_user: pmacct
sql_refresh_time: 1800
sql_optimize_clauses: true
sql_history: 1h
sql_history_roundoff: m
sql_recovery_logfile: /var/lib/pmacct/recovery_log
sql_table_version: 1
sql_startup_delay: 900
sql_preprocess: qnum=1000, minp=5

imt_buckets: 65537
imt_mem_pools_size: 1024000

nfacctd_port: 2055
nfacctd_ip: a.b.c.d
nfacctd_allow_file: /etc/pmacct/allow
---end---

>From the database:
netflow=> select * from pg_stat_activity;
 datid | datname | procpid | usesysid | usename | current_query | query_start
-------+---------+---------+----------+---------+---------------+-------------
 17144 | netflow |   20137 |      100 | pmacct  |               |
 17144 | netflow |    8800 |      100 | pmacct  |               |
 17144 | netflow |   19134 |      100 | pmacct  |               |
 17144 | netflow |   15412 |      100 | pmacct  |               |
 17144 | netflow |    6593 |      100 | pmacct  |               |
 17144 | netflow |    7435 |      100 | pmacct  |               |
 17144 | netflow |   27674 |      100 | pmacct  |               |
 17144 | netflow |   14043 |      100 | pmacct  |               |

netflow=> select count(*) from acct_out;
 count
-------
 15572
(1 row)

netflow=> select count(*) from acct_in;
 count
--------
 111030
(1 row)

netflow=> select stamp_inserted from acct_in group by stamp_inserted order by 
stamp_inserted;
   stamp_inserted
---------------------
 2005-11-19 00:00:00
 2005-11-19 01:00:00
 2005-11-19 08:00:00
 2005-11-19 09:00:00
 2005-11-19 10:00:00
 2005-11-19 11:00:00
 2005-11-19 12:00:00
 2005-11-19 13:00:00
 2005-11-19 14:00:00
 2005-11-19 15:00:00
 2005-11-19 16:00:00
 2005-11-19 17:00:00
 2005-11-19 18:00:00
 2005-11-19 19:00:00
 2005-11-19 20:00:00
 2005-11-19 21:00:00
 2005-11-19 22:00:00
 2005-11-19 23:00:00
 2005-11-20 00:00:00
 2005-11-20 01:00:00
 2005-11-20 02:00:00
 2005-11-20 03:00:00
 2005-11-20 04:00:00
 2005-11-20 05:00:00
 2005-11-20 06:00:00
 2005-11-20 07:00:00
 2005-11-20 08:00:00
 2005-11-20 09:00:00
 2005-11-20 10:00:00
 2005-11-20 11:00:00
 2005-11-20 12:00:00
 2005-11-20 13:00:00
 2005-11-20 14:00:00
 2005-11-20 15:00:00
 2005-11-20 16:00:00
 2005-11-20 17:00:00

Stracing one (staled?) process (strace -e recv,send -p 17203 -o log -s 1024), 
shows that:
recv(7, "C\0\0\0\rUPDATE 0\0Z\0\0\0\5T", 16384, 0) = 20
send(7, "Q\0\0\0\262INSERT INTO acct_in (stamp_updated, stamp_inserted, 
ip_dst, packets, bytes) VALUES (ABSTIME(1132505101):
:Timestamp, ABSTIME(1132502400)::Timestamp, \'192.168.87.215\', 6, 288)\0", 
179, 0) = 179
recv(7, "C\0\0\0\24INSERT 147356 1\0Z\0\0\0\5T", 16384, 0) = 27
send(7, "Q\0\0\0\316UPDATE acct_in SET packets=packets+8, bytes=bytes+384, 
stamp_updated=CURRENT_TIMESTAMP(0) WHERE ABSTIME(
1132502400)::Timestamp::Timestamp without time zone = stamp_inserted AND 
ip_dst=\'192.168.238.113\'\0", 207, 0) = 207
recv(7, "C\0\0\0\rUPDATE 0\0Z\0\0\0\5T", 16384, 0) = 20
send(7, "Q\0\0\0\263INSERT INTO acct_in (stamp_updated, stamp_inserted, 
ip_dst, packets, bytes) VALUES (ABSTIME(1132505101):
:Timestamp, ABSTIME(1132502400)::Timestamp, \'192.168.238.113\', 8, 384)\0", 
180, 0) = 180
recv(7, "C\0\0\0\24INSERT 147357 1\0Z\0\0\0\5T", 16384, 0) = 27
send(7, "Q\0\0\0\315UPDATE acct_in SET packets=packets+9, bytes=bytes+432, 
stamp_updated=CURRENT_TIMESTAMP(0) WHERE ABSTIME(
1132502400)::Timestamp::Timestamp without time zone = stamp_inserted AND 
ip_dst=\'192.168.204.19\'\0", 206, 0) = 206
recv(7, "C\0\0\0\rUPDATE 0\0Z\0\0\0\5T", 16384, 0) = 20
send(7, "Q\0\0\0\262INSERT INTO acct_in (stamp_updated, stamp_inserted, 
ip_dst, packets, bytes) VALUES (ABSTIME(1132505101):
:Timestamp, ABSTIME(1132502400)::Timestamp, \'192.168.204.19\', 9, 432)\0", 
179, 0) = 179
recv(7, "C\0\0\0\24INSERT 147358 1\0Z\0\0\0\5T", 16384, 0) = 27
send(7, "Q\0\0\0\316UPDATE acct_in SET packets=packets+6, bytes=bytes+288, 
stamp_updated=CURRENT_TIMESTAMP(0) WHERE ABSTIME(
1132502400)::Timestamp::Timestamp without time zone = stamp_inserted AND 
ip_dst=\'192.168.178.240\'\0", 207, 0) = 207
recv(7, "C\0\0\0\rUPDATE 0\0Z\0\0\0\5T", 16384, 0) = 20
send(7, "Q\0\0\0\263INSERT INTO acct_in (stamp_updated, stamp_inserted, 
ip_dst, packets, bytes) VALUES (ABSTIME(1132505101):
:Timestamp, ABSTIME(1132502400)::Timestamp, \'192.168.178.240\', 6, 288)\0", 
180, 0) = 180
recv(7, "C\0\0\0\24INSERT 147359 1\0Z\0\0\0\5T", 16384, 0) = 27
send(7, "Q\0\0\0\317UPDATE acct_in SET packets=packets+12, bytes=bytes+576, 
stamp_updated=CURRENT_TIMESTAMP(0) WHERE ABSTIME
(1132502400)::Timestamp::Timestamp without time zone = stamp_inserted AND 
ip_dst=\'192.168.242.166\'\0", 208, 0) = 208
recv(7, "C\0\0\0\rUPDATE 0\0Z\0\0\0\5T", 16384, 0) = 20
send(7, "Q\0\0\0\264INSERT INTO acct_in (stamp_updated, stamp_inserted, 
ip_dst, packets, bytes) VALUES (ABSTIME(1132505101):
:Timestamp, ABSTIME(1132502400)::Timestamp, \'192.168.242.166\', 12, 576)\0", 
181, 0) = 181
recv(7, "C\0\0\0\24INSERT 147360 1\0Z\0\0\0\5T", 16384, 0) = 27
send(7, "Q\0\0\0\315UPDATE acct_in SET packets=packets+5, bytes=bytes+240, 
stamp_updated=CURRENT_TIMESTAMP(0) WHERE ABSTIME(
1132502400)::Timestamp::Timestamp without time zone = stamp_inserted AND 
ip_dst=\'192.168.85.197\'\0", 206, 0) = 206
recv(7, "C\0\0\0\rUPDATE 0\0Z\0\0\0\5T", 16384, 0) = 20
send(7, "Q\0\0\0\262INSERT INTO acct_in (stamp_updated, stamp_inserted, 
ip_dst, packets, bytes) VALUES (ABSTIME(1132505101):
:Timestamp, ABSTIME(1132502400)::Timestamp, \'192.168.85.197\', 5, 240)\0", 
179, 0) = 179
recv(7, "C\0\0\0\24INSERT 147361 1\0Z\0\0\0\5T", 16384, 0) = 27
send(7, "Q\0\0\0\316UPDATE acct_in SET packets=packets+6, bytes=bytes+288, 
stamp_updated=CURRENT_TIMESTAMP(0) WHERE ABSTIME(
1132502400)::Timestamp::Timestamp without time zone = stamp_inserted AND 
ip_dst=\'192.168.197.235\'\0", 207, 0) = 207

Notice the "packets=packets+6, bytes=bytes+288", it is duplicated.
When I was running MySQL/InnoDB i saw after "show processlist" that when
the overload happened i was always having the same query which tried to 
increment the bytes field always by 288 ( the query was interlaced among 
others ).

Any suggestions or maybe i'm doing something wrong ?

-- 
Jakub Wartak
-vnull
FreeBSD/OpenBSD/Linux/Solaris/Network Administrator
http://vnull.pcnet.com.pl/

Reply via email to