On Fri, Jun 5, 2015 at 2:59 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > deavid <deavidsed...@gmail.com> writes: >> Some people already asked for "delayed write" indexes, but the idea gets >> discarded because the index could get out of sync, so it can omit results >> and this is unacceptable. But i think maybe that could be fixed in several >> ways and we can have a fast and reliable index (but maybe not so fast on >> selects). > > FWIW, GIN indexes already implement something that's like your mode 2 but > a bit better: there's an unordered "pending insertions" list that has to > be scanned by every search in addition to checking the main index body. > Every so often the pending insertions list is automatically flushed into > the main index body. > > The reason we do this for GIN is that that index type puts a huge premium > on doing inserts "in bulk"; it's a lot more efficient if you push many > rows into the index at once, because frequently they'll be inserting into > the same per-key posting lists. I do not see much opportunity for a > corresponding gain for btree.
A forest of btrees (say mode 2.c) may not be a bad idea. When tables grow consistently, the cost of I/O is usually high in FPW and random I/O due to the large spread of index updates. I don't have numbers, but on the databases I've handled it certainly was so. If you have a btree_forest am that will consist of several btrees that follow the GIN pattern only instead of an unordered list you have an ordered btree (which also simplifies merging), you should gain a lot. The big btrees will be read-only, so they will be compact (100% fill rate), you will generate less WAL (updates are all local on the small "staging" btree) and even the disk may perform better with that pattern. It is in fact a pattern used by inverted indexes already, so it wouldn't be too far-fetched. It is however hard to figure out when compaction has to happen. Concurrency shouldn't be an issue though, since all but the smallest btree would be read-only, so you only need a lock while modifying the forest structure (adding a new btree, swapping components with merged versions, etc). It would indeed, though, require a lot of extra storage to perform compaction. An alternative would be to implement compaction as a massive insert/delete instead. Certainly, how exactly compaction gets implemented would be key in deciding whether the approach breaks even. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers