On Sat, Apr 1, 2017 at 9:03 AM, anant khandelwal <anantbie...@gmail.com> wrote:
> From my findings the dependencies which create the anomalies in Snapshot > isolation are: > > wr-dependencies: if T1 writes a version of an object, and T2 > reads that version, then T1 appears to have executed before T2 > ww-dependencies: if T1 writes a version of some object, and > T2 replaces that version with the next version, then T1 appears > to have executed before T2 > ww-dependencies: if T1 writes a version of some object, and > T2 replaces that version with the next version, then T1 appears > to have executed before T2 I think you meant to replace one of those ww-dependencies sections with rw-dependencies -- which are, after all, the ones that cause all the trouble :-). > but due to the rw dependency caused by any insert into the index indexes > other than the btree acquires relation level lock which causes serialization > failure. That's both a bit strong and a bit narrow. A better statement might be that the relation-level predicate locks can lead to false positive serialization failures. After all, detection of a single write to the index doesn't all by itself lead to a serialization failure, and the problem when it does is that it might be a "false positive" -- in some cases where the serialization failure occurs there would not actually be a serialization anomaly without that failure. > So the main task is to implement page-level predicate locking in the > remaining core index AMs > > * Gist Index > > B+tree index acquires predicate locks only on leaf pages Well, if memory serves, we sometimes need to lock at the relation level (e.g., an empty index), but as far as *page* locks go, that seems correct because a scan doesn't exit early during the descent to the leaf level. The point is that predicate locks on indexes are intended to lock gaps between the index entries found. Any index entry read at the leaf level is dead (but not yet vacuumed away), points to a tuple not visible to the scanning snapshot, or points to a tuple which *is* visible to the snapshot. In the first two cases we don't need any predicate lock; in the last one the lock on the heap tuple covers things. What we need the locks on the indexes for is to cover the "gaps" into which later writes may occur. So the need is to ensure that during any insert of an index tuple that points to a heap tuple we might have read had it existed at the time of the index scan, we run into a predicate lock to allow us to recognize the rw-dependency. > But for Gist index, we have > to consider locking at each index level as index tuples can be stored in > buffers associated with internal pages or in leaf pages. Yes, because a gist scan can return with a "not found" indication before it gets to the leaf level. > So, the functions where we need to insert a call for > > 1. PredicateLockPage() > > ->gistScanPage() > after acquiring a shared lock on a buffer > > 2.CheckForSerializableConflictIn() > > -> gistdoinsert() > after acquiring an exclusive lock on a target buffer and before inserting a > tuple > > > 3. PredicateLockPageSplit() > > ->gistplacetopage() > > If there is not enough space for insertion, we need to copy predicate lock > from an old page to all new pages generated after a successful split > operation. > > PredicateLockPageSplit(Relation relation, BlockNumber oldblkno, BlockNumber > a lot) is used by b+-tree where two pages are involved in a split > operation. For Gist index, we can define a similar function where more > than one page can be generated after split operation. We could always call the existing function more than once, but there might be enough performance benefit to justify a new function with an array of page references. I would want to do it without changing the API first, and if time permits benchmark that versus a new function. Without measurable benefit, I don't want to complicate the API. > * Gin Index > > Gin index consists of a B-tree index over key values, where each key is an > element of some indexed items and each tuple in a leaf page contains either > a posting list if the list is small enough or a pointer to posting tree. > > 1. PredicateLockPage() > > ->startscanentry() > before calling collectMatchBitmap() > > ->scanPostingTree() > after acquiring a shared lock on a leaf page > > 2.CheckForSerializableConflictIn() > > -> ginentryinsert() > > ->gininsertitempointers() > in case of insertion in a posting tree > > > 3. PredicateLockPageSplit() > > -> dataBeginPlacetoPageLeaf() > > after calling dataPlacetoPageLeafSplit() > > * Hash Index > > 1. PredicateLockPage() > > ->hashgettuple() > ->_hash_first() > ->_hash_readnext() > ->_hash_readprev() > > 2.CheckForSerializableConflictIn() > > -> _hash_doinsert() > > 3. PredicateLockPageSplit() I have not looked in detail at this point, but that seems plausible and indicates admirable research effort in drafting this proposal. > This is just an idea also i understand that Page level predicate locking > exists in the btree AM, where it required the addition of 17 function calls > to implement, but remains unimplemented in the gin, gist, spgist, brin, and > hash index AMs. So we nned to insert function calls at other places also. Exactly. > Also tell me can we evaluate the performance on PostgreSQL on the following > workloads > > SIBENCH microbenchMark > TPC-c++ Those are good, but pgbench (both read-only and read-write loads) is popular to include because any pg hacker can duplicate the tests. https://www.postgresql.org/docs/current/static/pgbench.html -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers