Dnia niedziela, 20 listopada 2005 23:59, Paolo Lucente napisaĆ:
> Hello Jakub,
Hello Paolo, thanks for reply :)
> thank you for your words about the project and for the data about your
> acutal issue. The increased number of processes is basically a queue of
> writers that wait to obtain the permission to "use" the DB. This is because
> the use of the DB is protect by LOCKs. A new writer joins the queue each
> 'sql_refresh_time' seconds.
>
> When everything works correctly, a writer is finished with the DB before
> the next one starts. What i've not been able to understand is whether the
> *same* configuration causes the *same* problem under MySQL.
Yes, exactly the same configuration based on MySQL had the same issues.
>
> The hardware is ok and the number of tuples in both the tables seems fine.
> I would suggest to upgrade to either 0.9.3, 0.9.4p1 or the development
> snapshot 0.9.5 in order to further troubleshoot the issue (the process
> title can give additional informations while firing a "ps auxw").
Okay, I self-compiled 0.9.4p1, the results are identical, after several
minutes:
netflow=> select * from pg_stat_activity;
datid | datname | procpid | usesysid | usename |
current_query | query_start
-------+---------+---------+----------+---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------
17144 | netflow | 30462 | 100 | pmacct | UPDATE acct_in SET
packets=packets+6, bytes=bytes+288, stamp_updated=CURRENT_TIMESTAMP(0) WHERE
ABSTIME(1132588800)::Timestamp::Timestamp without time zone = stamp_inserted
AND ip_dst='192.168.86.98' | 2005-11-21 18:18:27.93401+01
17144 | netflow | 4273 | 100 | pmacct | <IDLE> | 2005-11-21
18:18:23.891838+01
( i tunned pgsql to provide some more statistics about queries that are
currently running ), notice again the "+288" bytes, for me it is very
strange, because as far as i understand ( based on sql_refresh_time option ),
the database shouldn be updated only every 30mins, not all the time... And
that continues updating of database is causing postgres/mysql to generate
very high load-average. And another thing: 192.168.86.98 doesn't exist.
biuro:~# !ps
ps uaxwww | grep nfacct
root 3913 0.0 1.0 16284 10496 ? Ss 17:30 0:00 nfacctd: Core
Process [default]
root 28327 0.0 1.1 17412 11884 ? S 17:30 0:00 nfacctd:
PostgreSQL Plugin [in]
root 6138 0.0 0.9 15432 9824 ? S 17:30 0:00 nfacctd:
PostgreSQL Plugin [out]
root 1143 0.0 1.1 17556 12092 ? S 18:15 0:00 nfacctd:
PostgreSQL Plugin -- DB Writer [in]
biuro:~# ps uaxww | grep postgres
postgres 21139 0.0 0.3 23520 3968 ? S Nov20
0:00 /usr/lib/postgresql/bin/postmaster -D /var/lib/postgres/data
postgres 2744 0.0 0.4 14320 4728 ? S Nov20 0:09 postgres: stats
buffer process
postgres 21894 0.0 0.3 13464 3856 ? S Nov20 0:08 postgres: stats
collector process
postgres 30462 93.7 1.3 24276 14284 ? R 18:15 10:53 postgres:
pmacctnetflow [local] UPDATE
> The fact that no nfacctd process consumes significative shares of the CPU
> and the strace seems to confirm that it should not be matter of any endless
> loop. Are you firing regularly VACUUMs of the tables ?
I think that Debian scripts are doing this already, from /etc/cron.d/postgres:
# Run VACUUM ANALYSE on all databases every 5 hours
2 0,5,10,15,20 * * 1-6 postgres if !
pidof /usr/lib/postgresql/bin/pg_autovacuum > /dev/null &&
[ -x /usr/lib/postgresql/bin/do.maintenance ];
then /usr/lib/postgresql/bin/do.maintenance -a; fi
# On Sunday run a VACUUM FULL ANALYSE as well
# If you run a 24/7 site, you may want to comment out this line and save
VACUUM
# FULL for when you think you really need it
# 10 3 * * sun postgres /usr/bin/test
-x /usr/lib/postgresql/bin/do.maintenance
&& /usr/lib/postgresql/bin/do.maintenance -a -f -F
> From your
> configuration i see you have enabled the 'sql_optimize_clauses'; did you
> setup an INDEX on the tables in order to speed up the queries ? Does the
> EXPLAIN confirm that the queries are using such index ? You can try
> wrapping a query from your strace report.
No i haven't setup an INDEX on the tables, but for now i disabled
'sql_optimize_clauses', after several minutes i'm getting the same.
After several 'select * from pg_stat_activity' i got:
netflow=> select * from pg_stat_activity;
datid | datname | procpid | usesysid | usename | current_query
| query_start
-------+---------+---------+----------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------
17144 | netflow | 29851 | 100 | pmacct | <IDLE>
| 2005-11-21 18:35:36.918+01
17144 | netflow | 25666 | 100 | pmacct | UPDATE acct_in SET
packets=packets+6, bytes=bytes+288, stamp_updated=CURRENT_TIMESTAMP(0) WHERE
ABSTIME(1132592400)::Timestamp::Timestamp without time zone = stamp_inserted
AND ip_dst='192.168.96.195' AND port_src=0 AND port_dst=0 AND ip_proto=0 AND
ip_sr | 2005-11-21 18:35:36.750585+01
(2 rows)
>
> In the end, i see that you are using an extremely big 'sql_refresh_time'.
> Try reducing it, say to 60-120 secs; and commenti out the
> 'sql_startup_delay' directive.
>
Done.
> Cheers,
> Paolo
>
>
> _______________________________________________
> pmacct-discussion mailing list
> [email protected]
> http://muffin.area.ba.cnr.it/mailman/listinfo/pmacct-discussion
--
Jakub Wartak
-vnull
FreeBSD/OpenBSD/Linux/Solaris/Network Administrator
http://vnull.pcnet.com.pl/