[PERFORM] Vacuum becomes slow
Hi, the time needed for a daily VACUUM on a table with about 28 mio records increases from day to day. What's the best way to avoid this? A full vacuum will probably take too much time, are there other ways to keep vacuum performant? The database was updated to postgres-8.0 on Jun 04 this year. Between Jun 07 and Jun 30 the time vacuum needed increased from 683 seconds up to 1,663 seconds, the output is posted below. E.g. the time for vacuuming the index of a text-field (i_ids_user) raised from 123 sec to 668 secs. The increase happens each day so this is not a problem of the last run. The number of records in the table in the same time only increased from 27.5 mio to 28.9 mio, the number of records updated daily is about 700,000 to 1,000,000. Regards Martin | Tue Jun 7 04:07:17 CEST 2005 Starting | SET VACUUM_MEM=25; VACUUM ANALYZE VERBOSE t_ids INFO: vacuuming "public.t_ids" INFO: index "i_ids_score" now contains 4323671 row versions in 12414 pages DETAIL: 493855 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 1.76s/5.44u sec elapsed 33.22 sec. INFO: index "i_ids_id" now contains 2752 row versions in 61515 pages DETAIL: 960203 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 8.09s/24.93u sec elapsed 108.43 sec. INFO: index "i_ids_user" now contains 2752 row versions in 103172 pages DETAIL: 960203 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 14.00s/39.65u sec elapsed 123.47 sec. INFO: "t_ids": removed 960203 row versions in 203369 pages DETAIL: CPU 22.88s/21.72u sec elapsed 294.22 sec. INFO: "t_ids": found 960203 removable, 2752 nonremovable row versions in 208912 pages DETAIL: 0 dead row versions cannot be removed yet. There were 214149 unused item pointers. 0 pages are entirely empty. CPU 53.02s/93.76u sec elapsed 643.46 sec. INFO: vacuuming "pg_toast.pg_toast_224670" INFO: index "pg_toast_224670_index" now contains 0 row versions in 1 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.01 sec. INFO: "pg_toast_224670": found 0 removable, 0 nonremovable row versions in 0 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.01 sec. INFO: analyzing "public.t_ids" INFO: "t_ids": scanned 9 of 208912 pages, containing 11846838 live rows and 0 dead rows; 9 rows in sample, 27499407 estimated total rows VACUUM | Tue Jun 7 04:18:40 CEST 2005 Job finished after 683 seconds | Thu Jun 30 01:23:33 CEST 2005 Starting | SET VACUUM_MEM=25; VACUUM ANALYZE VERBOSE t_ids INFO: vacuuming "public.t_ids" INFO: index "i_ids_score" now contains 4460326 row versions in 29867 pages DETAIL: 419232 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 4.58s/7.72u sec elapsed 368.13 sec. INFO: index "i_ids_id" now contains 28948643 row versions in 68832 pages DETAIL: 795700 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 9.08s/25.29u sec elapsed 151.38 sec. INFO: index "i_ids_user" now contains 28948938 row versions in 131683 pages DETAIL: 795700 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 20.10s/43.27u sec elapsed 668.00 sec. INFO: "t_ids": removed 795700 row versions in 206828 pages DETAIL: CPU 23.35s/23.50u sec elapsed 309.19 sec. INFO: "t_ids": found 795700 removable, 28948290 nonremovable row versions in 223145 pages DETAIL: 0 dead row versions cannot be removed yet. There were 906106 unused item pointers. 0 pages are entirely empty. CPU 63.10s/101.96u sec elapsed 1592.00 sec. INFO: vacuuming "pg_toast.pg_toast_224670" INFO: index "pg_toast_224670_index" now contains 0 row versions in 1 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_224670": found 0 removable, 0 nonremovable row versions in 0 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.02 sec. INFO: analyzing "public.t_ids" INFO: "t_ids": scanned 9 of 223146 pages, containing 11675055 live rows and 288 dead rows; 9 rows in sample, 28947131 estimated total rows VACUUM | Thu Jun 30 01:51:16 CEST 2005 Job finished after 1663 seconds
Re: [PERFORM] Reasons and drawbacks for unused item pointers (was: Update using primary key slow)
Tom Lane <[EMAIL PROTECTED]> writes: > Denis <[EMAIL PROTECTED]> writes: >> There were 1905028 unused item pointers. > The "unused item pointers" number seems a bit high, but otherwise that > looks pretty reasonable. > > Is it possible that the particular row you were updating has been > updated quite a lot of times since the last vacuum? Or even quite > a few times within a single transaction? What causes this "unused item pointers" and which impact do they have regarding performance? If I understood your last posting correctly more than one update on a single row between two vacuum's would i.e. result in one ore more "unused item pointer". Does this slow down the vacuum process and/or other processes? Until now I could not find an answer what this number implies. Regards Martin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Reasons and drawbacks for unused item pointers
Tom Lane <[EMAIL PROTECTED]> writes: > Martin Lesser <[EMAIL PROTECTED]> writes: >> What causes this "unused item pointers" and which impact do they have >> regarding performance? > The direct performance impact is really pretty minimal (and none at > all on indexscans, AFAIR). The reason Denis' number drew my attention > was that it implied that the table had gone un-vacuumed for awhile at > some time in the past. [...] To have 1905028 unused pointers in a > table with only 5106307 live entries suggests that at some point there > were 1.9 million (or so) dead but not-yet-vacuumed tuples, which > suggests insufficient vacuuming. Does each update of a single row result in an "unused item pointer"? I.e. if I update one row 10 times between VACUUMing the table the result are 10 unused pointers? Some rows in some of my tables are updated much more frequently than others so I'm not sure whether the number of unused pointers implie that I should VACUUM more often than every 24 hours. Martin ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Effects of cascading references in foreign keys
Which effects have UPDATEs on REFERENCEd TABLEs when only columns in the referenced table are updated which are not part of the FOREIGN KEY constraint? I have one "master"-table like create table t_master ( m_id serial primary key, m_fld1 ..., m_fld2 ..., ... ) The table above is referenced from several (~30) other tables, i.e. like create table t_detail ( d_ebid int REFERENCES t_master (m_id) ON UPDATE CASCADE ON DELETE CASCADE, d_fld1 ..., d_fld2 ..., ... ) All tables which reference t_master have appropriate indexes on the referencing columns, vacuum/analyze is done regularly (daily). Does an UPDATE of e.g. m_fld1 in t_master cause a 'lookup' in all tables which have a cascading update-rule or is this 'lookup' only triggered if the referenced column in t_master is explicitly updated? After removing some detail tables which are not longer needed we see an improvemed performance so at the moment it _looks_ like each update in t_master triggers a 'lookup' in each referencing table also if the referenced column (m_id) is not changed. I've read "If the row is updated, but the referenced column is not actually changed, no action is done." in the docs but it is not clear for me whether this "no action" really means "null action" and so the improved performance has other reasons. TIA, Martin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Effects of cascading references in foreign keys
Bruce Momjian writes: > Thomas F. O'Connell wrote: >> It seems like this warrants an item somewhere in the release notes, >> and I'm not currently seeing it (or a related item) anywhere. Perhaps >> E.1.3.1 (Performance Improvements)? For some of the more extreme >> UPDATE scenarios I've seen, this could be a big win. > Hard to say, perhaps: > > Prevent referential integrity triggers from firing if referenced > columns are not changed by an UPDATE > > Previously, triggers would fire but do nothing. And this "firing" has negative effects for the performance at least in versions before 8.1 (we use 8.0.3 in our production). One really dirty hack that comes in mind is to put an additional pk_table (with only one field, the pk from the master) between the "master"-table and the ~30 detail-tables so each update in the "master" would in most cases only trigger a lookup in one table. Only if a pk was really changed the CASCADEd trigger would force a triggered UPDATE in the detail-tables. After denormalization of two of the largest detail-tables into one table the performance improvement was about 10% due to the fact that up to 1 mio. of rows (of about 30 mio) in the "master"-table are updated daily and triggered a lookup in 190 mio. rows (before denormalization) resp. 115 rows (after denormalization). ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Partitioning / constrain exlusion not working with %-operator
I try to partition a large table (~ 120 mio. rows) into 50 smaller tables but using the IMO immutable %-function constraint exclusion does not work as expected: CREATE TABLE tt_m (id1 int, cont varchar); CREATE TABLE tt_0 (check (id1 % 50 = 0)) INHERITS (tt_m); CREATE TABLE tt_1 (check (id1 % 50 = 1)) INHERITS (tt_m); CREATE RULE ins_tt_0 AS ON INSERT TO tt_m WHERE id1 % 50 = 0 DO INSTEAD INSERT INTO tt_0 VALUES (new.*); CREATE RULE ins_tt_1 AS ON INSERT TO tt_m WHERE id1 % 50 = 1 DO INSTEAD INSERT INTO tt_1 VALUES (new.*); ... INSERT INTO tt_m (id1,cont) VALUES (0,'Test1'); INSERT INTO tt_m (id1,cont) VALUES (1,'Test2'); EXPLAIN SELECT * FROM tt_m WHERE id1=1; QUERY PLAN --- Result (cost=0.00..73.50 rows=18 width=36) -> Append (cost=0.00..73.50 rows=18 width=36) -> Seq Scan on tt_m (cost=0.00..24.50 rows=6 width=36) Filter: (id1 = 1) -> Seq Scan on tt_0 tt_m (cost=0.00..24.50 rows=6 width=36) Filter: (id1 = 1) -> Seq Scan on tt_1 tt_m (cost=0.00..24.50 rows=6 width=36) Filter: (id1 = 1) ... Only adding an explicit %-call to the query results in the expected plan: EXPLAIN SELECT * FROM tt_m WHERE id1=1 AND id1 % 50 = 1; QUERY PLAN --- Result (cost=0.00..60.60 rows=2 width=36) -> Append (cost=0.00..60.60 rows=2 width=36) -> Seq Scan on tt_m (cost=0.00..30.30 rows=1 width=36) Filter: ((id1 = 1) AND ((id1 % 50) = 1)) -> Seq Scan on tt_1 tt_m (cost=0.00..30.30 rows=1 width=36) Filter: ((id1 = 1) AND ((id1 % 50) = 1)) Did I miss something and/or how could I force the planner to use constraint exclusion without adding the explicit second condition above? TIA, Martin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Partitioning / constrain exlusion not working with %-operator
Tom Lane <[EMAIL PROTECTED]> writes: > It's usually better to use partitioning rules that have something to > do with the WHERE-clauses you'd be using anyway. For instance, try > to partition on ranges. I agree and tried to create new partitioned tables. But now I ran into some other performance-related trouble when inserting (parts of) the old (unpartioned) table into the new one: CREATE TABLE t_unparted (id1 int, cont varchar); -- Populate table with 1000 records with id1 from 1 to 1000 and ANALYZE CREATE TABLE t_parted (id1 int, cont varchar); CREATE TABLE t_parted_000 (check (id1 >=0 AND id1 < 100)) INHERITS (t_parted); CREATE RULE ins_000 AS ON INSERT TO t_parted WHERE id1 >= 0 AND id1 < 100 DO INSTEAD INSERT INTO t_parted_000 VALUES (new.*); -- ... 8 more tables + 8 more rules CREATE TABLE t_parted_900 (check (id1 >=900 AND id1 < 1000)) INHERITS (t_parted); CREATE RULE ins_900 AS ON INSERT TO t_parted WHERE id1 >= 900 AND id1 < 1000 DO INSTEAD INSERT INTO t_parted_900 VALUES (new.*); And now: EXPLAIN INSERT INTO t_parted SELECT * FROM t_parted WHERE id1>=0 AND id1<100; Result (cost=0.00..170.80 rows=12 width=36) -> Append (cost=0.00..170.80 rows=12 width=36) -> Seq Scan on t_parted (cost=0.00..85.40 rows=6 width=36) Filter: ((id1 >= 0) AND (id1 < 100) AND (((id1 >= 0) AND (id1 < 100)) IS NOT TRUE) AND (((id1 >= 100) AND (id1 < 200)) IS NOT TRUE) AND (((id1 >= 200) AND (id1 < 300)) IS NOT TRUE) AND (((id1 >= 300) AND (id1 < 400)) IS NOT TRUE) AND (((id1 >= 400) AND (id1 < 500)) IS NOT TRUE) AND (((id1 >= 500) AND (id1 < 600)) IS NOT TRUE) AND (((id1 >= 600) AND (id1 < 700)) IS NOT TRUE) AND (((id1 >= 700) AND (id1 < 800)) IS NOT TRUE) AND (((id1 >= 800) AND (id1 < 900)) IS NOT TRUE) AND (((id1 >= 900) AND (id1 < 1000)) IS NOT TRUE)) -> Seq Scan on t_parted_000 t_parted (cost=0.00..85.40 rows=6 width=36) Filter: ((id1 >= 0) AND (id1 < 100) AND (((id1 >= 0) AND (id1 < 100)) IS NOT TRUE) AND (((id1 >= 100) AND (id1 < 200)) IS NOT TRUE) AND (((id1 >= 200) AND (id1 < 300)) IS NOT TRUE) AND (((id1 >= 300) AND (id1 < 400)) IS NOT TRUE) AND (((id1 >= 400) AND (id1 < 500)) IS NOT TRUE) AND (((id1 >= 500) AND (id1 < 600)) IS NOT TRUE) AND (((id1 >= 600) AND (id1 < 700)) IS NOT TRUE) AND (((id1 >= 700) AND (id1 < 800)) IS NOT TRUE) AND (((id1 >= 800) AND (id1 < 900)) IS NOT TRUE) AND (((id1 >= 900) AND (id1 < 1000)) IS NOT TRUE)) Result (cost=0.00..66.40 rows=12 width=36) -> Append (cost=0.00..66.40 rows=12 width=36) -> Seq Scan on t_parted (cost=0.00..33.20 rows=6 width=36) Filter: ((id1 >= 0) AND (id1 < 100) AND (id1 >= 0) AND (id1 < 100)) -> Seq Scan on t_parted_000 t_parted (cost=0.00..33.20 rows=6 width=36) Filter: ((id1 >= 0) AND (id1 < 100) AND (id1 >= 0) AND (id1 < 100)) ... Result (cost=0.00..33.20 rows=6 width=36) -> Append (cost=0.00..33.20 rows=6 width=36) -> Seq Scan on t_parted (cost=0.00..33.20 rows=6 width=36) Filter: ((id1 >= 0) AND (id1 < 100) AND (id1 >= 900) AND (id1 < 1000)) (58 rows) The filters appended by the planner do not make any sense and cost too much time if the old table is huge. (constraint_exclusion was ON) Is there a better way to partition an existing table with a large number of rows (>100 mio)? TIA, Martin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match