[GENERAL] autovacuum not freeing up unused space on 8.3.0

2008-02-24 Thread Stuart Brooks
It appears (and I am open to correction) that autovacuum is not 
operating correctly in 8.3.0. I have a vanilla installation where 
autovacuum is enabled, and is running with all the default settings.


I have a table which is continually having rows added to it (~50/sec). 
For the sake of this example I am limiting it to 2 rows, which means 
that I am continually having to remove rows (100 at a time) as I get to 
2.


When I get to 2 rows for the first time the table disk size (using 
pg_total_relation_size) is around 5MB. Since the autovacuum only kicks 
in after a while I would expect it to get a little bigger (maybe 6-7MB) 
and then level out as I am cycling through recovered rows.


However the table disk size continues increasing basically linearly and 
when I stopped it it was approaching 40MB and heading up. During that 
time I was running ANALYZE VERBOSE periodically and I could see the dead 
rows increase and then drop down as the autovacuum kicked in - the 
autovacuum worker process was running. It didn't seem to free any space 
though. In fact a VACUUM FULL at this point didn't help a whole lot either.


I ran the same test but using manual VACUUMs every 60 seconds and the 
table size leveled out at 6.6MB so it appears like a normal vacuum is 
working. I changed the normal VACUUM to have the same delay parameters 
(20ms) as the autovacuum and it still worked.


So it appears to me like the autovacuum is not freeing up dead rows 
correctly.


I turned on logging for autovacuum and ran the same test and saw the 
following messages:


LOG:  automatic vacuum of table "metadb.test.transactions": index scans: 1
  pages: 0 removed, 254 remain
  tuples: 4082 removed, 19957 remain
  system usage: CPU 0.02s/0.02u sec elapsed 1.11 sec
LOG:  automatic vacuum of table "metadb.test.transactions": index scans: 1
  pages: 0 removed, 271 remain
  tuples: 5045 removed, 19954 remain
  system usage: CPU 0.03s/0.03u sec elapsed 1.54 sec
ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "metadb.test.transactions"

At this point I had deleted 32800 rows as can be seen from the query 
below, although the logs only indicated that around 1 rows had been 
freed up.


select min(transaction_key),max(transaction_key) from test.transactions;
min  |  max
---+---
32801 | 52750


Is there anything I have missed as far as setting this up is concerned, 
anything I could try? I would really rather use autovacuum than manage 
the vacuums of a whole lot of tables by hand...


Thanks
Stuart

PS. Running on NetBSD 3

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] autovacuum not freeing up unused space on 8.3.0

2008-02-26 Thread Stuart Brooks



ERROR:  canceling autovacuum task
CONTEXT:  automatic vacuum of table "metadb.test.transactions"


Are these happening regularly?  They indicate that something is
happening on the table that collides with what autovacuum needs to do,
and autovacuum defers its task.  For this to happen you need to be doing
ALTER TABLE or similar however; normal UPDATE/INSERT/DELETE should not
cause autovacuum to cancel itself.

I am not using an ALTER table command but I am doing periodic ANALYZEs 
to evaluate the table size. Could this be causing the problem? I notice 
that stopping the ANALYZE calls appears to eliminate the canceled 
autovacuum.


What concerns me is that once the size has grown, even a VACUUM FULL 
doesn't recover the space. Regular external VACUUMs keep the table at 
around 10MB but if I use autovacuum and it grows to 40MB, a VACUUM FULL 
will only get it down to 35MB. Is it possible that a canceled autovacuum 
could result in permanently lost space?


Out of interest, what kind of fragmentation overhead should I expect if 
I have a table in which I maintain a fixed number of rows. eg. A 2 
row table which is 6MB before rows are wrapped out will obviously use a 
larger disk footprint as rows are added and deleted. Anyone have a rule 
of thumb which works for them?


Thanks for the response,
Stuart


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] autovacuum not freeing up unused space on 8.3.0

2008-02-26 Thread Stuart Brooks



 >> ERROR:  canceling autovacuum task
 >> CONTEXT:  automatic vacuum of table "metadb.test.transactions"
 >
 > Are these happening regularly?  They indicate that something is
 > happening on the table that collides with what autovacuum needs to do,
 > and autovacuum defers its task.  For this to happen you need to be doing
 > ALTER TABLE or similar however; normal UPDATE/INSERT/DELETE should not
 > cause autovacuum to cancel itself.
 >
 I am not using an ALTER table command but I am doing periodic ANALYZEs
 to evaluate the table size. Could this be causing the problem? I notice
 that stopping the ANALYZE calls appears to eliminate the canceled
 autovacuum.




I am trying to reproduce the case here, but could not. Can you post the table
schema and the operations you are carrying out ? Is it just INSERT new rows
and DELETE old rows or are there any UPDATEs too ? Are there any long
running transactions open ?



It'll take a few minutes but I'll try and get the information to you. A 
summary is:


Process 1:
- writing 50 rows/second, 1 row/transaction.
- every so often delete 100 rows

Process 2:
- running ANALYZE VERBOSE and pg_total_relation_size every second

The result is that autovacuum appears to be canceled.

I was incorrect about autovacuum not recovering. Once I stop the 
ANALYZEs it appears to stabilise and recover some of the space after a 
little while. At that point a VACUUM FULL does help, and recovers quite 
a bit of space. I'll run through this again here and provide you with 
logs and VACUUM printouts.



 What concerns me is that once the size has grown, even a VACUUM FULL
 doesn't recover the space. Regular external VACUUMs keep the table at
 around 10MB but if I use autovacuum and it grows to 40MB, a VACUUM FULL
 will only get it down to 35MB. Is it possible that a canceled autovacuum
 could result in permanently lost space?



AFAIK it should not. Can you also post VACUUM FULL VERBOSE output ?


Thanks for your help,
Stuart


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] autovacuum not freeing up unused space on 8.3.0

2008-02-26 Thread Stuart Brooks



It'll take a few minutes but I'll try and get the information to you. A summary
is:

Process 1:
- writing 50 rows/second, 1 row/transaction.
- every so often delete 100 rows

Process 2:
- running ANALYZE VERBOSE and pg_total_relation_size every second



You'll probably have to vacuum pg_statistics as well then.

  


I presume because of the frequent ANALYZEs? In my real-world application 
I won't be running ANALYZE manually like this. I am only using it as a 
quick hack to get a picture of the dead rows in the various tables so I 
could get a feel for what was happening, and it seems that this is what 
is causing a conflict with the autovacuum...


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] autovacuum not freeing up unused space on 8.3.0

2008-02-26 Thread Stuart Brooks



 What concerns me is that once the size has grown, even a VACUUM FULL
 doesn't recover the space. Regular external VACUUMs keep the table at
 around 10MB but if I use autovacuum and it grows to 40MB, a VACUUM FULL
 will only get it down to 35MB. Is it possible that a canceled autovacuum
 could result in permanently lost space?




AFAIK it should not. Can you also post VACUUM FULL VERBOSE output ?

  


I have attached the vacuum output below, along with the table definition 
and a before and after of the table size. In this case a full vacuum (on 
the 2 row table) took it down from 34MB to 21MB. Maybe you can tell 
me if this is reasonable, bearing in mind that after inserting 2 
rows at the start the size is about 6MB, and under normal vacuuming 
conditions it sits around 10-12MB. This is better than the last time I 
ran it though.


Thanks for the help,
Stuart



metadb=> \d test.transactions
  Table "test.transactions"
Column  |  Type  |  
Modifiers
-++-
transaction_key | bigint | not null default 
nextval('test.transactions_transaction_key_seq'::regclass)
time| timestamp(6) without time zone | not null
cashier | text   | not null
till| integer| not null
ring| integer| not null
ev_tstamp   | integer| not null
ev_id   | integer| not null
camera  | integer| not null
Indexes:
   "transactions_pkey" PRIMARY KEY, btree (transaction_key)
   "transactions_camera_index" btree (camera)
   "transactions_cashier_index" btree (cashier, transaction_key)
   "transactions_event_index" btree (ring, ev_tstamp, ev_id)
   "transactions_time_index" btree ("time", transaction_key)



metadb=> select pg_total_relation_size('test.transactions');
pg_total_relation_size

  34242560
(1 row)


metadb=> vacuum full verbose test.transactions;
INFO:  vacuuming "test.transactions"
INFO:  "transactions": found 0 removable, 19996 nonremovable row versions in 
1592 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 64 to 68 bytes long.
There were 2109 unused item pointers.
Total free space (including removable row versions) is 10199944 bytes.
1416 pages are or will become empty, including 0 at the end of the table.
1347 pages containing 10194740 free bytes are potential move destinations.
CPU 0.00s/0.01u sec elapsed 0.24 sec.
INFO:  index "transactions_pkey" now contains 19996 row versions in 100 pages
DETAIL:  0 index row versions were removed.
42 index pages have been deleted, 42 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transactions_event_index" now contains 19996 row versions in 215 
pages
DETAIL:  0 index row versions were removed.
93 index pages have been deleted, 93 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.05 sec.
INFO:  index "transactions_camera_index" now contains 19996 row versions in 146 
pages
DETAIL:  0 index row versions were removed.
56 index pages have been deleted, 56 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.02 sec.
INFO:  index "transactions_cashier_index" now contains 19996 row versions in 
429 pages
DETAIL:  0 index row versions were removed.
290 index pages have been deleted, 290 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.05 sec.
INFO:  index "transactions_time_index" now contains 19996 row versions in 1496 
pages
DETAIL:  115518 index row versions were removed.
1412 index pages have been deleted, 1412 are currently reusable.
CPU 0.00s/0.11u sec elapsed 4.28 sec.

INFO:  "transactions": moved 19996 row versions, truncated 1592 to 208 pages
DETAIL:  CPU 0.12s/0.73u sec elapsed 20.75 sec.
INFO:  index "transactions_pkey" now contains 19996 row versions in 112 pages
DETAIL:  19996 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.06 sec.
INFO:  index "transactions_event_index" now contains 19996 row versions in 215 
pages
DETAIL:  19996 index row versions were removed.
77 index pages have been deleted, 77 are currently reusable.
CPU 0.00s/0.01u sec elapsed 0.04 sec.
INFO:  index "transactions_camera_index" now contains 19996 row versions in 152 
pages
DETAIL:  19996 index row versions were removed.
58 index pages have been deleted, 58 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.12 sec.
INFO:  index "transactions_cashier_index" now contains 19996 row versions in 
429 pages
DETAIL:  19996 index row versions were removed.
273 index pages have been deleted, 273 are currently reusable.
CPU 0.00s/0.01u sec elapsed 0.13 s

Re: [GENERAL] autovacuum not freeing up unused space on 8.3.0

2008-02-26 Thread Stuart Brooks



Process 2:
 - running ANALYZE VERBOSE and pg_total_relation_size every second



  

The result is that autovacuum appears to be canceled.



Yes, that will pretty much guarantee that an autovacuum is never able to
complete...

  
That's what I figured. Since I won't be running ANALYZE under normal 
circumstances this should all work fine...

What concerns me is that once the size has grown, even a VACUUM FULL
doesn't recover the space. Regular external VACUUMs keep the table at
around 10MB but if I use autovacuum and it grows to 40MB, a VACUUM FULL
will only get it down to 35MB. Is it possible that a canceled autovacuum
could result in permanently lost space?



Are you measuring index as well as table size?  VACUUM FULL is no good
at compacting indexes.

  
I am measuring pg_total_relation_size which I believe includes indexes. 
How does one go about compacting indexes if a VACUUM doesn't do the 
trick? I see that a recommendation is to drop and recreate the indexes. 
If one has a system running 24-7, then this might not be feasible.


Thanks
Stuart

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] autovacuum not freeing up unused space on 8.3.0

2008-02-27 Thread Stuart Brooks



Are you measuring index as well as table size?  VACUUM FULL is no good
at compacting indexes.
  
  
I am measuring pg_total_relation_size which I believe includes indexes.  
How does one go about compacting indexes if a VACUUM doesn't do the  
trick? I see that a recommendation is to drop and recreate the indexes.  
If one has a system running 24-7, then this might not be feasible.



The simplest way is to use REINDEX INDEX, but it needs a strong lock.

The more complex way is to do

CREATE INDEX CONCURRENTLY index_2 ...-- duplicating the original index
DROP INDEX index;

which does not need to grab a lock for a long period.

  


That does the trick and gets the table size down to what I'd expect from 
a 'clean' run. Now I just need to run a few tests to work out what a 
stable size is for a table with this many rows. Thanks for all the help 
tracking this down. It's really appreciated :)


Kind regards
Stuart


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] very delayed autovacuum on certain tables

2008-03-27 Thread Stuart Brooks
I am experiencing a strange problem where autovacuum appears to be 
vacuuming 1 table in preference to another even through they have very 
similar usage patterns.


For this test case I have 2 tables, 'transactions' and 'lineitems', and 
the ratio of writes is approx 1:3. I am filling these tables as fast as 
possible and once I reach approx 1'000'000 transactions (3mil lineitems) 
they are both periodically trimmed to keep their sizes constant.


The transactions table gets autovacuumed periodically, roughly when I 
would expect, but the autovacuum only seems to run on the lineitems 
table well after the trigger point (I am using the autovacuum defaults 
in postgresql.conf). According to pg_stat_user_tables there were 1.7m 
dead rows and 3.1m live rows when it decided to run the vacuum.


I have been logging data every minute from pg_stat_user_tables 
(n_tup_ins,n_live_tup,n_dead_tup,pg_total_relation_size()) for each of 
the tables and it makes interesting reading. For one thing, the number 
of dead tuples drops every now and again without the vacuum being run 
(is it possible that a vacuum is starting and then being terminated 
before completing?) and also the size of the lineitems table continues 
increasing where the transactions table levels off as expected.


I was wondering if there is any way I can get more logging information 
about the autovacuum decision making to find out exactly what is 
happening? I also read that the stats are not always accurate under 
high-load and was wondering if this could be affecting the vacuum.


Thanks
Stuart

PS. Running 8.3.1 on NetBSD 3.
PS2. I have attached the postgresql log and the data log (tab-separated).
PS3. I am not (to my knowledge) doing anything other than inserting rows 
into the database and periodically (every minute) pulling stats from 
pg_stat_user_tables. I am not running vacuum or analyze manually.




pglog.tgz
Description: Binary data


datalog.tgz
Description: Binary data

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] ANALYZE getting dead tuple count hopelessly wrong

2008-03-31 Thread Stuart Brooks
I have a table with about 15 million rows which is constantly having 
tuples added to the head and deleted in blocks from the tail to maintain 
the size. The dead tuple count in pg_stat_user_tables tracks the deleted 
rows fairly accurately until an auto-ANALYZE is done in the background 
at which point the value it calculates is wrong by a factor of 2-3 times 
(calculated value is 30-50% of the correct value), which completely 
throws the auto-VACUUMing. An example is that the auto-VACUUM only ran 
when there were 12 million (real) dead rows! Any ideas?


Thanks
Stuart

PS. Running 8.3.1 on NetBSD 3.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [HACKERS] [GENERAL] ANALYZE getting dead tuple count hopelessly wrong

2008-04-01 Thread Stuart Brooks



 Please do --- I have a lot of other stuff on my plate.




Please see the attached patch. One change I made is to hold the SHARE lock
on the page while ANALYZE is reading tuples from it. I thought it would
be a right thing to do instead of repeatedly acquiring/releasing the lock.
  
I have applied the patch and have started my test again, it takes a 
little while to fill up so I should have the results sometime tomorrow.


Thanks for the quick response.
Stuart

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [HACKERS] [GENERAL] ANALYZE getting dead tuple count hopelessly wrong

2008-04-07 Thread Stuart Brooks

Pavan Deolasee wrote:

On Fri, Apr 4, 2008 at 11:10 AM, Tom Lane <[EMAIL PROTECTED]> wrote:

  

 The
 policy of this project is that we only put nontrivial bug fixes into
 back branches, and I don't think this item qualifies ...




Got it. I will submit a patch for HEAD.

Thanks,
As I mentioned earlier, I patched 8.3.1 with Pavan's patch and have been 
running tests. After a few days I have got postgres to lock up - not 
sure if it is related. Below is a ps from my system (NetBSD 3).


TEST> ps -ax | grep post
1952 ?  IW2472 ?  DWCOMMIT

2661 ?  DW[WARN] PSQL:exec - failed in command relname,n_tup_ins,n_live_tup,n_dead_tup,pg_total_relation_size('s8_.' 
|| relname)*10/(1024*1024),last_autovacuum FROM pg_stat_user_tables 
WHERE schemaname='s8_' ORDER BY n_tup_ins DESC>

[WARN]   error = 'server closed the connection unexpectedly
   This probably means the server terminated abnormally
   before or while processing the request.'
[WARN] ConnectionNB: PQconsumeInput failed with error 'server closed the 
connection unexpectedly

   This probably means the server terminated abnormally
   before or while processing the request.'


The server is still running but I can't access it. A top yields:

load averages:  0.23,  0.23,  0.2109:53:58
110 processes: 109 sleeping, 1 on processor

Memory: 513M Act, 256M Inact, 1336K Wired, 75M Exec, 557M File, 2776K Free
Swap: 600M Total, 600M Free


 PID USERNAME PRI NICE   SIZE   RES STATE  TIME   WCPUCPU COMMAND
 463 root   20  6132K   14M select 0:06  0.05%  0.05% kdeinit
2472 postgres -22   -2  4580K4K mclpl814:23  0.00%  0.00% 
2631 root -220   644K4K mclpl606:25  0.00%  0.00% 
 233 root   2024M   31M select 4:47  0.00%  0.00% XFree86
 451 root   20  3544K   15M select 4:45  0.00%  0.00% kdeinit
  16 root  180 0K  182M syncer 3:51  0.00%  0.00% [ioflush]
  17 root -180 0K  182M aiodoned   1:46  0.00%  0.00% [aiodoned]
  15 root -180 0K  182M pgdaemon   1:30  0.00%  0.00% [pagedaemon]
1301 root -220  4092K4K mclpl  1:23  0.00%  0.00% 
2680 postgres   2   -2  3560K 1588K poll   1:18  0.00%  0.00% postgres
1493 root   20  3488K   17M select 1:09  0.00%  0.00% korgac
 461 root   20  3748K   16M select 0:57  0.00%  0.00% kdeinit
3156 postgres   2   -2  3448K 1792K select 0:45  0.00%  0.00% postgres
1174 root   20  2608K 2928K select 0:40  0.00%  0.00% profiler
1428 root   20  3376K   13M select 0:26  0.00%  0.00% kdeinit
2661 postgres -22   -2  4896K4K mclpl  0:11  0.00%  0.00% 

I'm not convinced this is a postgresql bug (state=mclpl concerns me), 
but it's the first time I've seen it. I suppose it could be: 
http://www.netbsd.org/cgi-bin/query-pr-single.pl?number=35224.


Anything I can do which might help isolating the problem?

Regards
Stuart




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general