On 06/06/15 04:07, deavid wrote:
There are several use cases where I see useful an index, but adding it
will slow too much inserts and updates.
For example, when we have 10 million rows on a table, and it's a table
which has frequent updates, we need several index to speed up selects,
but then we'll slow down updates a lot, specially when we have 10 or
more indexes.
Other cases involve indexes for text search, which are used only for
user search and aren't that important, so we want to have them, but we
don't want the overload they put whenever we write on the table.
I know different approaches that already solve some of those problems
in some ways (table partitioning, partial indexes, etc), but i don't
feel they are the solution to every problem of this kind.
Some people already asked for "delayed write" indexes, but the idea
gets discarded because the index could get out of sync, so it can omit
results and this is unacceptable. But i think maybe that could be
fixed in several ways and we can have a fast and reliable index (but
maybe not so fast on selects).
Since I do not know every internal of postgres, i feel simpler to
share here and ask which things can or cannot be done.
Let's imagine there is a new type of index called "weird_btree", where
we trade-off simplicity for speed. In almost every mode, we will rely
on VACUUM to put our index in optimal state.
Mode 1: on "aminsert" mark the index as INVALID. So, if you modified
the table you need to run REINDEX/CREATE INDEX CONCURRENTLY before
doing SELECT. This is almost the same as create index concurrently,
the main difference is you don't have to remember to drop the index
before writing. (I don't see much benefit here)
Mode 2: on "aminsert", put the new entry in a plain, unordered list
instead of the btree. Inserting at the end of a list should be faster
than big btrees and you'll know later which entries you missed indexing.
Mode 2.a: on index scan (amrescan, amgettuple), pretend that after the
btree there is the list and output every row, out-of order. You will
have to tell postgres that our index isn't sorted and it will have to
recheck every row.
Mode 2.b: mark the index invalid instead. When doing the next vacuum,
sort the list and insert it to the btree in a bulk operation. If it's
ok, mark the index valid.
Mode 3: on "aminsert", put the new entry on a second btree; leaving
the first one untouched. Because the second btree is new, will be
small, and writes should be faster. When doing a index scan, read
tuples from both at same time (like merge sort). On vacuum, merge the
second btree onto the first. On this mode, the index is sorted and
there's no need of recheck.
Anyone thinks this would be a interesting feature for postgresql?
Did I miss something?
PD: Maybe it's also possible to take advantage of clustering, and have
indexes which entries are range of TIDs; but i'm not sure if this is
too exotic, or if it will make a difference.
Sincerely,
David.
How about a hybrid indexing system, with 2 parts:
(1) existing index system which is checked first and has been mostly
optimised for speed of reading. If there are only a few inserts/updates
and the system is not heavily loaded, then it gets modified
immediately. The threshold for being too busy, and few enough changes,
could be configurable.
(2) overflow index optimised for writing. Possible in memory and not
backed to permanent storage. A crash would require a complete index
rebuild - but only when there were entries in it (or at least more than
some configurable threshold, to allow for cases were some missing index
entries are acceptable).
So where the index is needed for a query, part 1 is checked first, and
the part 2 if necessary
Have a background process that will move entries from part 2 to part 1,
when the systems is less busy.
Cheers,
Gavin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers