[GENERAL] autovacuum not freeing up unused space on 8.3.0
It appears (and I am open to correction) that autovacuum is not operating correctly in 8.3.0. I have a vanilla installation where autovacuum is enabled, and is running with all the default settings. I have a table which is continually having rows added to it (~50/sec). For the sake of this example I am limiting it to 2 rows, which means that I am continually having to remove rows (100 at a time) as I get to 2. When I get to 2 rows for the first time the table disk size (using pg_total_relation_size) is around 5MB. Since the autovacuum only kicks in after a while I would expect it to get a little bigger (maybe 6-7MB) and then level out as I am cycling through recovered rows. However the table disk size continues increasing basically linearly and when I stopped it it was approaching 40MB and heading up. During that time I was running ANALYZE VERBOSE periodically and I could see the dead rows increase and then drop down as the autovacuum kicked in - the autovacuum worker process was running. It didn't seem to free any space though. In fact a VACUUM FULL at this point didn't help a whole lot either. I ran the same test but using manual VACUUMs every 60 seconds and the table size leveled out at 6.6MB so it appears like a normal vacuum is working. I changed the normal VACUUM to have the same delay parameters (20ms) as the autovacuum and it still worked. So it appears to me like the autovacuum is not freeing up dead rows correctly. I turned on logging for autovacuum and ran the same test and saw the following messages: LOG: automatic vacuum of table "metadb.test.transactions": index scans: 1 pages: 0 removed, 254 remain tuples: 4082 removed, 19957 remain system usage: CPU 0.02s/0.02u sec elapsed 1.11 sec LOG: automatic vacuum of table "metadb.test.transactions": index scans: 1 pages: 0 removed, 271 remain tuples: 5045 removed, 19954 remain system usage: CPU 0.03s/0.03u sec elapsed 1.54 sec ERROR: canceling autovacuum task CONTEXT: automatic vacuum of table "metadb.test.transactions" At this point I had deleted 32800 rows as can be seen from the query below, although the logs only indicated that around 1 rows had been freed up. select min(transaction_key),max(transaction_key) from test.transactions; min | max ---+--- 32801 | 52750 Is there anything I have missed as far as setting this up is concerned, anything I could try? I would really rather use autovacuum than manage the vacuums of a whole lot of tables by hand... Thanks Stuart PS. Running on NetBSD 3 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] autovacuum not freeing up unused space on 8.3.0
ERROR: canceling autovacuum task CONTEXT: automatic vacuum of table "metadb.test.transactions" Are these happening regularly? They indicate that something is happening on the table that collides with what autovacuum needs to do, and autovacuum defers its task. For this to happen you need to be doing ALTER TABLE or similar however; normal UPDATE/INSERT/DELETE should not cause autovacuum to cancel itself. I am not using an ALTER table command but I am doing periodic ANALYZEs to evaluate the table size. Could this be causing the problem? I notice that stopping the ANALYZE calls appears to eliminate the canceled autovacuum. What concerns me is that once the size has grown, even a VACUUM FULL doesn't recover the space. Regular external VACUUMs keep the table at around 10MB but if I use autovacuum and it grows to 40MB, a VACUUM FULL will only get it down to 35MB. Is it possible that a canceled autovacuum could result in permanently lost space? Out of interest, what kind of fragmentation overhead should I expect if I have a table in which I maintain a fixed number of rows. eg. A 2 row table which is 6MB before rows are wrapped out will obviously use a larger disk footprint as rows are added and deleted. Anyone have a rule of thumb which works for them? Thanks for the response, Stuart ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] autovacuum not freeing up unused space on 8.3.0
>> ERROR: canceling autovacuum task >> CONTEXT: automatic vacuum of table "metadb.test.transactions" > > Are these happening regularly? They indicate that something is > happening on the table that collides with what autovacuum needs to do, > and autovacuum defers its task. For this to happen you need to be doing > ALTER TABLE or similar however; normal UPDATE/INSERT/DELETE should not > cause autovacuum to cancel itself. > I am not using an ALTER table command but I am doing periodic ANALYZEs to evaluate the table size. Could this be causing the problem? I notice that stopping the ANALYZE calls appears to eliminate the canceled autovacuum. I am trying to reproduce the case here, but could not. Can you post the table schema and the operations you are carrying out ? Is it just INSERT new rows and DELETE old rows or are there any UPDATEs too ? Are there any long running transactions open ? It'll take a few minutes but I'll try and get the information to you. A summary is: Process 1: - writing 50 rows/second, 1 row/transaction. - every so often delete 100 rows Process 2: - running ANALYZE VERBOSE and pg_total_relation_size every second The result is that autovacuum appears to be canceled. I was incorrect about autovacuum not recovering. Once I stop the ANALYZEs it appears to stabilise and recover some of the space after a little while. At that point a VACUUM FULL does help, and recovers quite a bit of space. I'll run through this again here and provide you with logs and VACUUM printouts. What concerns me is that once the size has grown, even a VACUUM FULL doesn't recover the space. Regular external VACUUMs keep the table at around 10MB but if I use autovacuum and it grows to 40MB, a VACUUM FULL will only get it down to 35MB. Is it possible that a canceled autovacuum could result in permanently lost space? AFAIK it should not. Can you also post VACUUM FULL VERBOSE output ? Thanks for your help, Stuart ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] autovacuum not freeing up unused space on 8.3.0
It'll take a few minutes but I'll try and get the information to you. A summary is: Process 1: - writing 50 rows/second, 1 row/transaction. - every so often delete 100 rows Process 2: - running ANALYZE VERBOSE and pg_total_relation_size every second You'll probably have to vacuum pg_statistics as well then. I presume because of the frequent ANALYZEs? In my real-world application I won't be running ANALYZE manually like this. I am only using it as a quick hack to get a picture of the dead rows in the various tables so I could get a feel for what was happening, and it seems that this is what is causing a conflict with the autovacuum... ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] autovacuum not freeing up unused space on 8.3.0
What concerns me is that once the size has grown, even a VACUUM FULL doesn't recover the space. Regular external VACUUMs keep the table at around 10MB but if I use autovacuum and it grows to 40MB, a VACUUM FULL will only get it down to 35MB. Is it possible that a canceled autovacuum could result in permanently lost space? AFAIK it should not. Can you also post VACUUM FULL VERBOSE output ? I have attached the vacuum output below, along with the table definition and a before and after of the table size. In this case a full vacuum (on the 2 row table) took it down from 34MB to 21MB. Maybe you can tell me if this is reasonable, bearing in mind that after inserting 2 rows at the start the size is about 6MB, and under normal vacuuming conditions it sits around 10-12MB. This is better than the last time I ran it though. Thanks for the help, Stuart metadb=> \d test.transactions Table "test.transactions" Column | Type | Modifiers -++- transaction_key | bigint | not null default nextval('test.transactions_transaction_key_seq'::regclass) time| timestamp(6) without time zone | not null cashier | text | not null till| integer| not null ring| integer| not null ev_tstamp | integer| not null ev_id | integer| not null camera | integer| not null Indexes: "transactions_pkey" PRIMARY KEY, btree (transaction_key) "transactions_camera_index" btree (camera) "transactions_cashier_index" btree (cashier, transaction_key) "transactions_event_index" btree (ring, ev_tstamp, ev_id) "transactions_time_index" btree ("time", transaction_key) metadb=> select pg_total_relation_size('test.transactions'); pg_total_relation_size 34242560 (1 row) metadb=> vacuum full verbose test.transactions; INFO: vacuuming "test.transactions" INFO: "transactions": found 0 removable, 19996 nonremovable row versions in 1592 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 64 to 68 bytes long. There were 2109 unused item pointers. Total free space (including removable row versions) is 10199944 bytes. 1416 pages are or will become empty, including 0 at the end of the table. 1347 pages containing 10194740 free bytes are potential move destinations. CPU 0.00s/0.01u sec elapsed 0.24 sec. INFO: index "transactions_pkey" now contains 19996 row versions in 100 pages DETAIL: 0 index row versions were removed. 42 index pages have been deleted, 42 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "transactions_event_index" now contains 19996 row versions in 215 pages DETAIL: 0 index row versions were removed. 93 index pages have been deleted, 93 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.05 sec. INFO: index "transactions_camera_index" now contains 19996 row versions in 146 pages DETAIL: 0 index row versions were removed. 56 index pages have been deleted, 56 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.02 sec. INFO: index "transactions_cashier_index" now contains 19996 row versions in 429 pages DETAIL: 0 index row versions were removed. 290 index pages have been deleted, 290 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.05 sec. INFO: index "transactions_time_index" now contains 19996 row versions in 1496 pages DETAIL: 115518 index row versions were removed. 1412 index pages have been deleted, 1412 are currently reusable. CPU 0.00s/0.11u sec elapsed 4.28 sec. INFO: "transactions": moved 19996 row versions, truncated 1592 to 208 pages DETAIL: CPU 0.12s/0.73u sec elapsed 20.75 sec. INFO: index "transactions_pkey" now contains 19996 row versions in 112 pages DETAIL: 19996 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.06 sec. INFO: index "transactions_event_index" now contains 19996 row versions in 215 pages DETAIL: 19996 index row versions were removed. 77 index pages have been deleted, 77 are currently reusable. CPU 0.00s/0.01u sec elapsed 0.04 sec. INFO: index "transactions_camera_index" now contains 19996 row versions in 152 pages DETAIL: 19996 index row versions were removed. 58 index pages have been deleted, 58 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.12 sec. INFO: index "transactions_cashier_index" now contains 19996 row versions in 429 pages DETAIL: 19996 index row versions were removed. 273 index pages have been deleted, 273 are currently reusable. CPU 0.00s/0.01u sec elapsed 0.13 s
Re: [GENERAL] autovacuum not freeing up unused space on 8.3.0
Process 2: - running ANALYZE VERBOSE and pg_total_relation_size every second The result is that autovacuum appears to be canceled. Yes, that will pretty much guarantee that an autovacuum is never able to complete... That's what I figured. Since I won't be running ANALYZE under normal circumstances this should all work fine... What concerns me is that once the size has grown, even a VACUUM FULL doesn't recover the space. Regular external VACUUMs keep the table at around 10MB but if I use autovacuum and it grows to 40MB, a VACUUM FULL will only get it down to 35MB. Is it possible that a canceled autovacuum could result in permanently lost space? Are you measuring index as well as table size? VACUUM FULL is no good at compacting indexes. I am measuring pg_total_relation_size which I believe includes indexes. How does one go about compacting indexes if a VACUUM doesn't do the trick? I see that a recommendation is to drop and recreate the indexes. If one has a system running 24-7, then this might not be feasible. Thanks Stuart ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] autovacuum not freeing up unused space on 8.3.0
Are you measuring index as well as table size? VACUUM FULL is no good at compacting indexes. I am measuring pg_total_relation_size which I believe includes indexes. How does one go about compacting indexes if a VACUUM doesn't do the trick? I see that a recommendation is to drop and recreate the indexes. If one has a system running 24-7, then this might not be feasible. The simplest way is to use REINDEX INDEX, but it needs a strong lock. The more complex way is to do CREATE INDEX CONCURRENTLY index_2 ...-- duplicating the original index DROP INDEX index; which does not need to grab a lock for a long period. That does the trick and gets the table size down to what I'd expect from a 'clean' run. Now I just need to run a few tests to work out what a stable size is for a table with this many rows. Thanks for all the help tracking this down. It's really appreciated :) Kind regards Stuart ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] very delayed autovacuum on certain tables
I am experiencing a strange problem where autovacuum appears to be vacuuming 1 table in preference to another even through they have very similar usage patterns. For this test case I have 2 tables, 'transactions' and 'lineitems', and the ratio of writes is approx 1:3. I am filling these tables as fast as possible and once I reach approx 1'000'000 transactions (3mil lineitems) they are both periodically trimmed to keep their sizes constant. The transactions table gets autovacuumed periodically, roughly when I would expect, but the autovacuum only seems to run on the lineitems table well after the trigger point (I am using the autovacuum defaults in postgresql.conf). According to pg_stat_user_tables there were 1.7m dead rows and 3.1m live rows when it decided to run the vacuum. I have been logging data every minute from pg_stat_user_tables (n_tup_ins,n_live_tup,n_dead_tup,pg_total_relation_size()) for each of the tables and it makes interesting reading. For one thing, the number of dead tuples drops every now and again without the vacuum being run (is it possible that a vacuum is starting and then being terminated before completing?) and also the size of the lineitems table continues increasing where the transactions table levels off as expected. I was wondering if there is any way I can get more logging information about the autovacuum decision making to find out exactly what is happening? I also read that the stats are not always accurate under high-load and was wondering if this could be affecting the vacuum. Thanks Stuart PS. Running 8.3.1 on NetBSD 3. PS2. I have attached the postgresql log and the data log (tab-separated). PS3. I am not (to my knowledge) doing anything other than inserting rows into the database and periodically (every minute) pulling stats from pg_stat_user_tables. I am not running vacuum or analyze manually. pglog.tgz Description: Binary data datalog.tgz Description: Binary data -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] ANALYZE getting dead tuple count hopelessly wrong
I have a table with about 15 million rows which is constantly having tuples added to the head and deleted in blocks from the tail to maintain the size. The dead tuple count in pg_stat_user_tables tracks the deleted rows fairly accurately until an auto-ANALYZE is done in the background at which point the value it calculates is wrong by a factor of 2-3 times (calculated value is 30-50% of the correct value), which completely throws the auto-VACUUMing. An example is that the auto-VACUUM only ran when there were 12 million (real) dead rows! Any ideas? Thanks Stuart PS. Running 8.3.1 on NetBSD 3. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] ANALYZE getting dead tuple count hopelessly wrong
Please do --- I have a lot of other stuff on my plate. Please see the attached patch. One change I made is to hold the SHARE lock on the page while ANALYZE is reading tuples from it. I thought it would be a right thing to do instead of repeatedly acquiring/releasing the lock. I have applied the patch and have started my test again, it takes a little while to fill up so I should have the results sometime tomorrow. Thanks for the quick response. Stuart -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] ANALYZE getting dead tuple count hopelessly wrong
Pavan Deolasee wrote: On Fri, Apr 4, 2008 at 11:10 AM, Tom Lane <[EMAIL PROTECTED]> wrote: The policy of this project is that we only put nontrivial bug fixes into back branches, and I don't think this item qualifies ... Got it. I will submit a patch for HEAD. Thanks, As I mentioned earlier, I patched 8.3.1 with Pavan's patch and have been running tests. After a few days I have got postgres to lock up - not sure if it is related. Below is a ps from my system (NetBSD 3). TEST> ps -ax | grep post 1952 ? IW2472 ? DWCOMMIT 2661 ? DW[WARN] PSQL:exec - failed in command relname,n_tup_ins,n_live_tup,n_dead_tup,pg_total_relation_size('s8_.' || relname)*10/(1024*1024),last_autovacuum FROM pg_stat_user_tables WHERE schemaname='s8_' ORDER BY n_tup_ins DESC> [WARN] error = 'server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request.' [WARN] ConnectionNB: PQconsumeInput failed with error 'server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request.' The server is still running but I can't access it. A top yields: load averages: 0.23, 0.23, 0.2109:53:58 110 processes: 109 sleeping, 1 on processor Memory: 513M Act, 256M Inact, 1336K Wired, 75M Exec, 557M File, 2776K Free Swap: 600M Total, 600M Free PID USERNAME PRI NICE SIZE RES STATE TIME WCPUCPU COMMAND 463 root 20 6132K 14M select 0:06 0.05% 0.05% kdeinit 2472 postgres -22 -2 4580K4K mclpl814:23 0.00% 0.00% 2631 root -220 644K4K mclpl606:25 0.00% 0.00% 233 root 2024M 31M select 4:47 0.00% 0.00% XFree86 451 root 20 3544K 15M select 4:45 0.00% 0.00% kdeinit 16 root 180 0K 182M syncer 3:51 0.00% 0.00% [ioflush] 17 root -180 0K 182M aiodoned 1:46 0.00% 0.00% [aiodoned] 15 root -180 0K 182M pgdaemon 1:30 0.00% 0.00% [pagedaemon] 1301 root -220 4092K4K mclpl 1:23 0.00% 0.00% 2680 postgres 2 -2 3560K 1588K poll 1:18 0.00% 0.00% postgres 1493 root 20 3488K 17M select 1:09 0.00% 0.00% korgac 461 root 20 3748K 16M select 0:57 0.00% 0.00% kdeinit 3156 postgres 2 -2 3448K 1792K select 0:45 0.00% 0.00% postgres 1174 root 20 2608K 2928K select 0:40 0.00% 0.00% profiler 1428 root 20 3376K 13M select 0:26 0.00% 0.00% kdeinit 2661 postgres -22 -2 4896K4K mclpl 0:11 0.00% 0.00% I'm not convinced this is a postgresql bug (state=mclpl concerns me), but it's the first time I've seen it. I suppose it could be: http://www.netbsd.org/cgi-bin/query-pr-single.pl?number=35224. Anything I can do which might help isolating the problem? Regards Stuart -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general