On Mon, Mar 5, 2018 at 8:58 AM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > On Sun, Mar 4, 2018 at 12:53 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: >> Yes, but I think it would be better if we call this once we are sure >> that at least one tuple from the old bucket has been transferred >> (consider if all tuples in the old bucket are dead). Apart from this, >> I think this patch has missed handling the cases where we scan the >> buckets when the split is in progress. In such cases, we scan both >> old and new bucket, so I think we need to ensure that we take >> PredicateLock on both the buckets during such scans. > > Hmm. Yeah. > > So, in _hash_first(), do you think we might just need this? > > if (H_BUCKET_BEING_POPULATED(opaque)) > { > ... > old_blkno = _hash_get_oldblock_from_newbucket(rel, bucket); > ... > old_buf = _hash_getbuf(rel, old_blkno, HASH_READ, LH_BUCKET_PAGE); > + PredicateLockPage(rel, BufferGetBlockNumber(old_buf), > scan->xs_snapshot); > TestForOldSnapshot(scan->xs_snapshot, rel, BufferGetPage(old_buf)); > > That is, if you begin scanning a 'new' bucket, we remember the old > bucket and go and scan that too, so we'd better predicate-lock both up > front (or I suppose we could do it later when we visit that page, but > here it can be done in a single place). >
Yeah, that can work, but I am slightly worried that we might actually never scan the old bucket (say for queries with Limit clause) in which case it might give false positives for insertions in old buckets. > What if we begin scanning an 'old' bucket that is being split? I > think we'll only do that for tuples that actually belong in the old > bucket after the split, so no need to double-lock? And I don't think > a split could begin while we are scanning. Do I have that right? > Right. > As for inserting, I'm not sure if any special treatment is needed, as > long as the scan code path (above) and the split code path are > correct. I'm not sure though. > I also don't think we need any special handling for insertions. > I'm wondering how to test all this. I'm thinking about a program that > repeatedly creates a hash index and then slowly adds more things to it > so that buckets split (maybe using distinct keys carefully crafted to > hit the same bucket?), while concurrently hammering it with a ton of > scans and then ... somehow checking correctness... > Yeah, that will generate the required errors, but not sure how to verify correctness. One idea could be that when the split is in progress, we somehow stop it in-between (say by cancel request) and then run targeted selects and inserts on the bucket being scanned and bucket being populated. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com