On 04/02/2016 09:57 PM, Tom Lane wrote:
Konstantin Knizhnik <k.knizh...@postgrespro.ru> writes:
Attached please find patch for "ALTER INDEX ... WHERE ..." clause.
It is now able to handle all three possible situations:
1. Making index partial (add WHERE condition to the ordinary index)
2. Extend partial index range (less restricted index predicate)
3. Arbitrary change of partial index predicate
I've not been following this thread previously, but this proposal
scares me quite a lot. I am certain there are places in our code
that assume that the properties of an index don't change after it's
been created. One area that this almost certainly breaks is HOT updates:
adding a previously-unindexed column to an index predicate might break
existing HOT chains, and I see nothing in this patch that could deal
with that. I seem to recall there are other places that would be
broken by changing an index's DDL definition after creation, but can't
recall specifics right now.
I am also, frankly, not seeing a use-case for this functionality that
would justify trying to find and remove those assumptions.
There's a lot of things I don't care for about the way the patch is
written, in particular its willingness to use SPI (which opens a lot of
potential for search-path problems, failure to see uncommitted tuples,
etc). But we need not get to that if we don't believe the functionality
can work.
Thank you for review, Tom.
I completely agree with all your arguments against this patch.
I have proposed this patch mostly as prove of concept.
Yes, I have not take in account hot updates and may be there are other possible
issues which I not considered.
The main question is whether the proposed way of batch update of indexes is
viable or it is conceptually wrong approach
(because it beaks assumption that index properties can't be changed or because
it is not convenient to use...).
I hope that everybody agree that maintaining of indexes is the main limiting
factor for insert speed.
If table has no indexes, then insert speed can be as high as disk write speed
(100Mb/sec or 1000 for SSD).
So if size of record is about 10 bytes, then we can get about 10 millions TPS.
But presence of indexes will dramatically change this picture: if database is
large enough so that even index can not fit in memory
and records are inserted in random key order, then each insert in index will
require reading of 3-4 pages from random locations on the disk.
With average HDD positioning time 10 msec, we get 100 reads per second and ...
20-30 TPS. It is just with one index.
If we have 10 indexes, then TPS can be less than fingers on a hand.
Certainly it is very pessimistic estimation.
But still it is true that we can not provide good insert speed if we have to
update indexes immediately.
And without indexes we can not efficiently execute most of queries.
I do not see any way in Postgres to solve this problem now. The hack with
creating materialized views requires a lot of extra time and space.
It will not work for really large table.
So we need some way to postpone insertion of new records in the index. Then we can do such insertion in background or in idle time (at night), try to use bulk insert if index implementation supports it (for example sorting records by key before insert can
significantly increase locality and so improve speed of insert in index). But the principle moment here is that such delayed update of index violates the main RDBMS rule that results of query execution with and without indexes should be the same. The trick
with partial indexes allows to eliminate this contradiction. But it requires more actions from user. So are users ready to do some exatra job just because of "idealogical" reasons? Because if user wants to have delayed update of indexes, then he actually
approves that it is ok for him that query results may not include some most recent updates.
Another aspect is which database objects are allowed to be altered and which
not. Right now with tables we can alter almost everything.
With indexes - almost nothing. It is assumed that index can always be reconstructed. But for very big table reconstruction of indexes from scratch will take unacceptable amount of time. So should we make it possible to alter some index characteristics
which do not require to rebuild index from scratch (and it is definitely true for partial index predicate)? Or price of supporting it is so high, that it can not be compensated by obtained benefits?
So what do you think?
1. Should I continue work in this direction and fix all possible issues with
hot updates,... to make it possible to alter partial index predicates and
support batch inserts i this way?
2. Or it is better to just add extra option to the index, allowing it to be slightly out-of-sync? It will allow, for example, to eliminate pending list for GIN which can cause very significant degradation of query speed, while for most full-text search
engine it is acceptable that changes are not immediately visible.
Certainly much more work is required here except of just adding a new index
option...
3. Or both of the approaches are wrong and we should leave everything as it is?
4. Or may be there is some other approach which is more acceptable?
This patch includes src/bin/insbench utility for testing insert
performance. It can be easily excluded from the patch to reduce it size.
C++ is not likely to get accepted into our tree ...
regards, tom lane
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers