Christopher Browne wrote: > What is unclear to me in the discussion is whether or not this is > invalidating the item on the TODO list... > > ------------------- > Create a bitmap of pages that need vacuuming > > Instead of sequentially scanning the entire table, have the background > writer or some other process record pages that have expired rows, then > VACUUM can look at just those pages rather than the entire table. In > the event of a system crash, the bitmap would probably be > invalidated. One complexity is that index entries still have to be > vacuumed, and doing this without an index scan (by using the heap > values to find the index entry) might be slow and unreliable, > especially for user-defined index functions. > ------------------- > > It strikes me as a non-starter to draw vacuum work directly into the > foreground; there is a *clear* loss in that the death of the tuple > can't actually take place at that point, due to MVCC and the fact that > it is likely that other transactions will be present, keeping the > tuple from being destroyed. > > But it would *seem* attractive to do what is in the TODO, above. > Alas, the user defined index functions make cleanout of indexes much > more troublesome :-(. But what's in the TODO is still "wholesale," > albeit involving more targetted selling than the usual Kirby VACUUM > :-).
What bothers me about the TODO item is that if we have to sequentially scan indexes, are we really gaining much by not having to sequentially scan the heap? If the heap is large enough to gain from a bitmap, the index is going to be large too. Is disabling per-index cleanout for expression indexes the answer? The entire expression index problem is outlined in this thread: http://archives.postgresql.org/pgsql-hackers/2006-02/msg01127.php I don't think it is a show-stopper because if we fail to find the index that matches the heap, we know we have a problem and can report it and fall back to an index scan. Anyway, as I remember, if you have a 20gig table, a vacuum / sequential scan is painful, but if we have to sequential scan the all indexes, that is probably just as painful. If we can't make headway there and we can't cleanout indexes without an sequential index scan, I think we should just remove the TODO item and give up on improving vacuum performance. For the bitmaps, index-only scans require a bit that says "all page tuples are visible" while vacuum wants "some tuples are expired". DELETE would clear both bits, while INSERT would clear just the first, and update is a mix of INSERT and UPDATE, though perhaps on different pages. -- Bruce Momjian http://candle.pha.pa.us SRA OSS, Inc. http://www.sraoss.com + If your life is a hard drive, Christ can be your backup. + ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend