On Mon, Feb 13, 2017 at 5:47 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Sat, Feb 11, 2017 at 6:35 PM, Amit Kapila <amit.kapil...@gmail.com> wrote: >>>>> Why can't we rely on _bt_walk_left? >>>> The reason is mentioned in comments, but let me try to explain with >>>> some example. When you reach that point of code, it means that either >>>> the current page (assume page number is 10) doesn't contain any >>>> matching items or it is a half-dead page, both of which indicates that >>>> we have to move to the previous page. Now, before checking if the >>>> current page contains matching items, we signal parallel machinery >>>> (via _bt_parallel_release) to allow workers to read the previous page >>>> (assume previous page number is 9). So it is quite possible that >>>> after deciding that current page (page number 10) doesn't contain any >>>> matching tuples if we directly move to the previous page (in this case >>>> it will be 9) by using _bt_walk_left, some other worker would have >>>> read page 9. In short, if we directly use _bt_walk_left(), then we >>>> are prone to returning some of the values twice as multiple workers >>>> can read the same page. >>> But ... the entire point of the seize-and-release stuff is to avoid >>> this problem. You're suppose to seize the scan, read the current >>> page, walk left, store the page you find in the scan, and then release >>> the scan. >> Exactly and that is what is done in the patch. Basically, if we found >> that the current page is half-dead or it doesn't contain any matching >> items, then release the current buffer, seize the scan, read the >> current page, walk left and so on. I am slightly confused here >> because it seems both of us agree on what is the right thing to do and >> according to me that is how it is implemented. Are you just ensuring >> about whether I have implemented as discussed or do you see a problem >> with the way it is implemented? > > Well, before, I thought you said that relying entirely on > _bt_walk_left couldn't work because then two people might end up > running it at the same time, and that would cause problems. But if > you can only run _bt_walk_left while you've got the scan seized, then > that can't happen. Evidently I'm missing something here. >
I think the comment at that place is not as clear as it should be. So how about changing it as below: Existing comment: -------------------------- /* * For parallel scans, get the last page scanned as it is quite * possible that by the time we try to fetch previous page, other * worker has also decided to scan that previous page. So we * can't rely on _bt_walk_left call. */ Modified comment: -------------------------- /* * For parallel scans, it is quite possible that by the time we try to fetch * the previous page, another worker has also decided to scan that * previous page. So to avoid that we need to get the last page scanned * from shared scan descriptor before calling _bt_walk_left. */ -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers