On Tue, 30 Apr 2024 at 10:36, Cary Huang <cary.hu...@highgo.ca> wrote: > In one of our migration scenarios, we rely on tid range scan to migrate huge > table from one database to another once the lower and upper ctid bound is > determined. With the support of parallel ctid range scan, this process could > be done much quicker.
I would have thought that the best way to migrate would be to further divide the TID range into N segments and run N queries, one per segment to get the data out. >From a CPU point of view, I'd hard to imagine that a SELECT * query without any other items in the WHERE clause other than the TID range quals would run faster with multiple workers than with 1. The problem is the overhead of pushing tuples to the main process often outweighs the benefits of the parallelism. However, from an I/O point of view on a server with slow enough disks, I can imagine there'd be a speedup. > The attached patch is my approach to add parallel ctid range scan to > PostgreSQL's planner and executor. In my tests, I do see an increase in > performance using parallel tid range scan over the single worker tid range > scan and it is also faster than parallel sequential scan covering similar > ranges. Of course, the table needs to be large enough to reflect the > performance increase. > > below is the timing to complete a select query covering all the records in a > simple 2-column table with 40 million records, > > - tid range scan takes 10216ms > - tid range scan with 2 workers takes 7109ms > - sequential scan with 2 workers takes 8499ms Can you share more details about this test? i.e. the query, what the times are that you've measured (EXPLAIN ANALYZE, or SELECT, COPY?). Also, which version/commit did you patch against? I was wondering if the read stream code added in v17 would result in the serial case running faster because the parallelism just resulted in more I/O concurrency. Of course, it may be beneficial to have parallel TID Range for other cases when more row filtering or aggregation is being done as that requires pushing fewer tuples over from the parallel worker to the main process. It just would be good to get to the bottom of if there's still any advantage to parallelism when no filtering other than the ctid quals is being done now that we've less chance of having to wait for I/O coming from disk with the read streams code. David