As for the initscan, It looks to me that the codes and comments don't match (obviously I'm wrong, this is why I'm asking).
/* * Determine the number of blocks we have to scan. * * It is sufficient to do this once at scan start, since any tuples added * while the scan is in progress will be invisible to my snapshot anyway. Andy : I can understand until this. * That is not true when using a non-MVCC snapshot. However, we couldn't * guarantee to return tuples added after scan start anyway, Andy: For any isolation level rather than "READ Committed", we should not read that, for "READ UNCommitted", we can still do the same. So I think I can understand it here. * since they * might go into pages we already scanned. To guarantee consistent * results for a non-MVCC snapshot, the caller must hold some higher-level * lock that ensures the interesting tuple(s) won't change.) */ Andy: I can't understand what the "To guarantee consistent results for a non-MVCC snapshot" mean. Looks something need to be handled differently for non-MVCC snapshot. Until now I think we CAN Determine the number of blocks only once for MVCC snapshot which should be very common. if (scan->rs_parallel != NULL) scan->rs_nblocks = scan->rs_parallel->phs_nblocks; else scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_rd); Andy: However I see the code checks the number of blocks at every rescan regardless of snapshot type which I can't understand. This behavior doesn't cause any troubles to me (I may care about this for Index Scan, but looks IndexScan doesn't need to do that), So I am asking just for education purposes. Thanks! -- Best Regards Andy Fan