>> 5. Comment for _bt_parallel_seize() says: >> "False indicates that we have reached the end of scan for >> current scankeys and for that we return block number as P_NONE." >> >> What is the reason to check (blkno == P_NONE) after checking (status == false) >> in _bt_first() (see code below)? If comment is correct >> we'll never reach _bt_parallel_done() >> >> + blkno = _bt_parallel_seize(scan, &status); >> + if (status == false) >> + { >> + BTScanPosInvalidate(so->currPos); >> + return false; >> + } >> + else if (blkno == P_NONE) >> + { >> + _bt_parallel_done(scan); >> + BTScanPosInvalidate(so->currPos); >> + return false; >> + } >> >The first time master backend or worker hits last page (calls this >API), it will return P_NONE and after that any worker tries to fetch >next page, it will return status as false. I think we can expand a >comment to explain it clearly. Let me know, if you need more >clarification, I can explain it in detail.
Probably this was confusing because we have not mentioned that P_NONE can be returned even when status = TRUE and not just when status is false. I think, the comment above the function can be modified as follows, + /* + * True indicates that the block number returned is either valid including P_NONE + * and scan is continued or block number is invalid and scan has just + * begun. Thank you, Rahila Syed On Thu, Dec 22, 2016 at 9:49 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: > On Wed, Dec 21, 2016 at 8:46 PM, Anastasia Lubennikova > <lubennikov...@gmail.com> wrote: > > The following review has been posted through the commitfest application: > > make installcheck-world: tested, passed > > Implements feature: tested, passed > > Spec compliant: tested, passed > > Documentation: tested, passed > > > > Hi, thank you for the patch. > > Results are very promising. Do you see any drawbacks of this feature or > something that requires more testing? > > > > I think you can focus on the handling of array scan keys for testing. > In general, one of my colleagues has shown interest in testing this > patch and I think he has tested as well but never posted his findings. > I will request him to share his findings and what kind of tests he has > done, if any. > > > I'm willing to oo a review. > > Thanks, that will be helpful. > > > > I saw the discussion about parameters in the thread above. And I agree > that we'd better concentrate > > on the patch itself and add them later if necessary. > > > > 1. Can't we simply use "if (scan->parallel_scan != NULL)" instead of > xs_temp_snap flag? > > > > + if (scan->xs_temp_snap) > > + UnregisterSnapshot(scan->xs_snapshot); > > > > I agree with what Rober has told in his reply. We do same way for > heap, refer heap_endscan(). > > > I must say that I'm quite new with all this parallel stuff. If you give > me a link, > > where to read about snapshots for parallel workers, my review will be > more helpful. > > > > You can read transam/README.parallel. Refer "State Sharing" portion > of README to learn more about it. > > > Anyway, it would be great to have more comments about it in the code. > > > > We are sharing snapshot to ensure that reads in both master backend > and worker backend can use the same snapshot. There is no harm in > adding comments, but I think it is better to be consistent with > similar heapam code. After reading README.parallel, if you still feel > that we should add more comments in the code, then we can definitely > do that. > > > 2. Don't you mind to rename 'amestimateparallelscan' to let's say > 'amparallelscan_spacerequired' > > or something like this? > > Sure, I am open to other names, but IMHO, lets keep "estimate" in the > name to keep it consistent with other parallel stuff. Refer > execParallel.c to see how widely this word is used. > > > As far as I understand there is nothing to estimate, we know this size > > for sure. I guess that you've chosen this name because of > 'heap_parallelscan_estimate'. > > But now it looks similar to 'amestimate' which refers to indexscan cost > for optimizer. > > That leads to the next question. > > > > Do you mean 'amcostestimate'? If you want we can rename it > amparallelscanestimate to be consistent with amcostestimate. > > > 3. Are there any changes in cost estimation? > > > > Yes. > > > I didn't find related changes in the patch. > > Parallel scan is expected to be faster and optimizer definitely should > know that. > > > > You can find the relavant changes in > parallel_index_opt_exec_support_v2.patch, refer cost_index(). > > > 4. + uint8 ps_pageStatus; /* state of scan, see below */ > > There is no desciption below. I'd make the comment more helpful: > > /* state of scan. See possible flags values in nbtree.h */ > > makes sense. Will change. > > > And why do you call it pageStatus? What does it have to do with page? > > > > During scan this tells us whether next page is available for scan. > Another option could be to name it as scanStatus, but not sure if that > is better. Do you think if we add a comment like "indicates whether > next page is available for scan" for this variable then it will be > clear? > > > 5. Comment for _bt_parallel_seize() says: > > "False indicates that we have reached the end of scan for > > current scankeys and for that we return block number as P_NONE." > > > > What is the reason to check (blkno == P_NONE) after checking (status == > false) > > in _bt_first() (see code below)? If comment is correct > > we'll never reach _bt_parallel_done() > > > > + blkno = _bt_parallel_seize(scan, &status); > > + if (status == false) > > + { > > + BTScanPosInvalidate(so->currPos); > > + return false; > > + } > > + else if (blkno == P_NONE) > > + { > > + _bt_parallel_done(scan); > > + BTScanPosInvalidate(so->currPos); > > + return false; > > + } > > > > The first time master backend or worker hits last page (calls this > API), it will return P_NONE and after that any worker tries to fetch > next page, it will return status as false. I think we can expand a > comment to explain it clearly. Let me know, if you need more > clarification, I can explain it in detail. > > > 6. To avoid code duplication, I would wrap this into the function > > > > + /* initialize moreLeft/moreRight appropriately for scan > direction */ > > + if (ScanDirectionIsForward(dir)) > > + { > > + so->currPos.moreLeft = false; > > + so->currPos.moreRight = true; > > + } > > + else > > + { > > + so->currPos.moreLeft = true; > > + so->currPos.moreRight = false; > > + } > > + so->numKilled = 0; /* just paranoia */ > > + so->markItemIndex = -1; /* ditto */ > > > > Okay, I think we can write a separate function (probably inline > function) for above. > > > And after that we can also get rid of _bt_parallel_readpage() which only > > bring another level of indirection to the code. > > > > See, this function is responsible for multiple actions like > initializing moreLeft/moreRight positions, reading next page, dropping > the lock/pin. So replicating all these actions in the caller will > make the code in caller less readable as compared to now. Consider > this point and let me know your view on same. > > > 7. Just a couple of typos I've noticed: > > > > * Below flags are used indicate the state of parallel scan. > > * Below flags are used TO indicate the state of parallel scan. > > > > * On success, release lock and pin on buffer on success. > > * On success release lock and pin on buffer. > > > > Will fix. > > > 8. I didn't find a description of the feature in documentation. > > Probably we need to add a paragraph to the "Parallel Query" chapter. > > > > Yes, I am aware of that and I think it makes sense to add it now > rather than waiting until the end. > > > I will send another review of performance until the end of the week. > > > > Okay, you can refer Rafia's mail above for non-default settings she > has used in her performance tests with TPC-H. > > > -- > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com >