Hello David

Thank you so much for testing the patch more thoroughly and for sharing your 
patch.
I see that you moved the setting of ParallelBlockTableScanDesc fields from
heap_set_tidrange() to be within heap_setscanlimits() so that the missing case 
of
(ItemPointerCompare(&highestItem, &lowestItem) < 0) can be covered. This
placement should be fine.

The workers, however, still end up in heap_set_tidrange() during a call to
TidRangeNext() via table_rescan_tidrange() at the beginning of the scan as you 
have
described. Each of the worker does attempt to update the tid range to basically 
the
same range to the parallel context shared among the workers in shared memory,
which may seem a little racy here. 

The problem is that the logic in TidRangeNext() is mostly designed as 
non-parallel
mode and that  table_rescan_tidrange() is incorrectly called in parallel mode, 
causing
each worker to try to update TID ranges at the same time.

The initialization of scan descriptor, rescan and setting TID ranges in fact 
take place
at different places between parallel and non-parallel modes. In non-parallel 
mode,
everything takes place in TidRangeNext() while in parallel mode, they take 
place in
ExecTidRangeScanInitializeDSM() and ExecTidRangeScanReInitializeDSM() and they
are called only by the leader.

I have updated TidRangeNext() to not do any rescan or set TID limit in parallel 
mode.

I have also updated ExecTidRangeScanInitializeDSM() to set the TID range after
initializing parallel scan descriptor, which is shared to the workers via 
shared memory.
Similarly, in ExecTidRangeScanReInitializeDSM(), a new TID range will be set 
after the
re-initialization of parallel scan descriptor during the rescan case.
ExecTidRangeScanInitializeWorker() is called by each parallel worker and is also
updated such that it will not set the TID limits again. 

Since ExecTidRangeScanInitializeDSM() and ExecTidRangeScanReInitializeDSM() are
only called by the leader process, there will not be any concurrent call to
heap_setscanlimits(), so no locking is needed there. 

On top of the 2 patches you shared, I have attached the third patch with the 
changes
I describe above, so it is easy to spot the new diffs.

Thank you once again for your review!

Cary

Attachment: v9-0002-fixup-v7-parallel-TID-range-scan-patch.patch
Description: Binary data

Attachment: v9-0003-fix-the-incorrect-call-to-scan_set_tidrange.patch
Description: Binary data

Attachment: v9-0001-v7-parallel-TID-range-scan-patch.patch
Description: Binary data

Reply via email to