Re: Support tid range scan in parallel?

Cary Huang Thu, 24 Jul 2025 11:47:10 -0700

Hi David

Thank you so much for the review! I have addressed the comments in the
attached v7 patch.


 > 1. In cost_tidrangescan() you're dividing the total costs by the
 > number of workers yet the comment is claiming that's CPU cost. I think
 > this needs to follow the lead of cost_seqscan() and separate out the
 > CPU and IO cost then add the IO cost at the end, after the divide by
 > the number of workers.

I have separated the costs into disk and CPU costs similar to the style in
cost_seqscan().

 > 2. In execParallel.c, could you move the case for T_TidRangeScanState
 > below T_ForeignScanState? What you have right now is now quite
 > following the unwritten standard set out by the other nodes, i.e
 > non-parallel aware nodes are last. A good spot seems to be putting it
 > at the end of the scan types... Custom scan seems slightly misplaced,
 > but probably can ignore that one and put it after T_ForeignScanState

Yes, it's been done.

 > 3. The following comment should mention what behaviour occurs when the
 > field is set to InvalidBlockNumber:

Also addressed 

 > 4. I think the following would be clearer if written using an else if:
 > 
 > if (nallocated >= pbscan->phs_nblocks || (pbscan->phs_numblock !=
 >           InvalidBlockNumber && nallocated >= pbscan->phs_numblock))
 >     page = InvalidBlockNumber; /* all blocks have been allocated */
 > else
 >     page = (nallocated + pbscan->phs_startblock) % pbscan->phs_nblocks;
 > 
 > e.g:
 > 
 > if (nallocated >= pbscan->phs_nblocks)
 >     page = InvalidBlockNumber; /* all blocks have been allocated */
 > else if (pbscan->phs_numblock != InvalidBlockNumber &&
 >             nallocated >= pbscan->phs_numblock)
 >     page = InvalidBlockNumber; /* upper scan limit reached */
 > else
 >     page = (nallocated + pbscan->phs_startblock) % pbscan->phs_nblocks;
 > 
 > That way the comment after the assignment is accurate.

Agreed, and also addressed.

 > 5. For the tests, is there any reason not to reuse the tidrangescan table?

To test TID range scan in parallel, I have a new table created with very low 
fill
factor such that there will be more pages created with small and fixed amount
of tuples in each. Then, the test would do SELECT COUNT(*) to ensure correct
amount of tuples are being counted by the parallel workers during parallel TID
range scan. With this new table, it is easy to ensure the count is correct since
we know how many tuples exist in each page and how many pages are to be
scanned based on the WHERE predicates. Reusing the tidrangescan table
would be hard to tell if the count is correct in my case. 


thank you!
Cary.

v7-0001-add-parallel-tid-rangescan.patch
Description: Binary data

Re: Support tid range scan in parallel?

Reply via email to