On Mon, Dec 28, 2020 at 3:14 PM vignesh C <vignes...@gmail.com> wrote: > > Attached is a patch that was used for the same. The patch is written > on top of the parallel copy patch. > The design Amit, Andres & myself voted for that is the leader > identifying the line bound design and sharing it in shared memory is > performing better.
Hi Hackers, I see following are some of the problem with parallel copy feature: 1) Leader identifying the line/tuple boundaries from the file, letting the workers pick, insert parallelly vs leader reading the file and letting workers identify line/tuple boundaries, insert 2) Determining parallel safety of partitioned tables 3) Bulk extension of relation while inserting i.e. adding more than one extra blocks to the relation in RelationAddExtraBlocks Please let me know if I'm missing anything. For (1) - from Vignesh's experiments above, it shows that the " leader identifying the line/tuple boundaries from the file, letting the workers pick, insert parallelly" fares better. For (2) - while it's being discussed in another thread (I'm not sure what's the status of that thread), how about we take this feature without the support for partitioned tables i.e. parallel copy is disabled for partitioned tables? Once the other discussion gets to a logical end, we can come back and enable parallel copy for partitioned tables. For (3) - we need a way to extend or add new blocks fastly - fallocate might help here, not sure who's working on it, others can comment better here. Can we take the "parallel copy" feature forward of course with some restrictions in place? Thoughts? Regards, Bharath Rupireddy.