On Tue, Mar 6, 2018 at 11:26 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: > On Mon, Mar 5, 2018 at 8:58 AM, Thomas Munro > <thomas.mu...@enterprisedb.com> wrote: >> On Sun, Mar 4, 2018 at 12:53 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: >>> Yes, but I think it would be better if we call this once we are sure >>> that at least one tuple from the old bucket has been transferred >>> (consider if all tuples in the old bucket are dead). Apart from this, >>> I think this patch has missed handling the cases where we scan the >>> buckets when the split is in progress. In such cases, we scan both >>> old and new bucket, so I think we need to ensure that we take >>> PredicateLock on both the buckets during such scans. >> >> Hmm. Yeah. >> >> So, in _hash_first(), do you think we might just need this? >> >> if (H_BUCKET_BEING_POPULATED(opaque)) >> { >> ... >> old_blkno = _hash_get_oldblock_from_newbucket(rel, bucket); >> ... >> old_buf = _hash_getbuf(rel, old_blkno, HASH_READ, LH_BUCKET_PAGE); >> + PredicateLockPage(rel, BufferGetBlockNumber(old_buf), >> scan->xs_snapshot); >> TestForOldSnapshot(scan->xs_snapshot, rel, BufferGetPage(old_buf)); >> >> That is, if you begin scanning a 'new' bucket, we remember the old >> bucket and go and scan that too, so we'd better predicate-lock both up >> front (or I suppose we could do it later when we visit that page, but >> here it can be done in a single place). >> > > Yeah, that can work, but I am slightly worried that we might actually > never scan the old bucket (say for queries with Limit clause) in which > case it might give false positives for insertions in old buckets. >
I have changed the patch to address this point by acquiring predicate lock in _hash_readnext where it will acquire the lock only when it tries to scan the old bucket. I have also addressed another problem related to transfer of predicate locks during split such that it will transfer locks only when there is any tuple transferred from old to the new bucket. > >> I'm wondering how to test all this. I'm thinking about a program that >> repeatedly creates a hash index and then slowly adds more things to it >> so that buckets split (maybe using distinct keys carefully crafted to >> hit the same bucket?), while concurrently hammering it with a ton of >> scans and then ... somehow checking correctness... >> > > Yeah, that will generate the required errors, but not sure how to > verify correctness. One idea could be that when the split is in > progress, we somehow stop it in-between (say by cancel request) and > then run targeted selects and inserts on the bucket being scanned and > bucket being populated. > I have verified the bucket split scenario manually as below: Setup ------------ create table hash_tbl(id int4, p integer); insert into hash_tbl (id, p) select g, 10 from generate_series(1, 10) g; Analyze hash_tbl; create index hash_idx on hash_tbl using hash(p); Session-1 ---------------- begin isolation level serializable; set enable_seqscan=off; set enable_bitmapscan=off; set enable_indexonlyscan=on; select sum(p) from hash_tbl where p=10; sum ----- 100 (1 row) insert into hash_tbl (id, p) select g, 10 from generate_series(10, 1000) g; -- Via debugger, stop in _hash_splitbucket at line 1283 {.. LockBuffer(bucket_obuf, BUFFER_LOCK_EXCLUSIVE); ...} By this time split of bucket 1 is done but the split flag is not cleared. So, predicate lock from bucket-1 have been transferred to bucket-3 (new bucket). Session-2 ---------------- begin isolation level serializable; set enable_seqscan=off; set enable_bitmapscan=off; set enable_indexonlyscan=on; select sum(p) from hash_tbl where p=10; sum ----- 100 (1 row) Session-1 -------------- Commit; Session-2 ---------- postgres=# insert into hash_tbl (id, p) select g, 10 from generate_series(51, 60) g; ERROR: could not serialize access due to read/write dependencies among transactions DETAIL: Reason code: Canceled on identification as a pivot, during write. HINT: The transaction might succeed if retried. It got conflict while inserting in the new bucket (bucket number -3). I think this patch now addresses all the open issues. Let me know what do you think about it? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Predicate-Locking-in-hash-index_v6.patch
Description: Binary data