Re: Parallel copy

Bharath Rupireddy Sun, 06 Mar 2022 23:57:15 -0800

On Mon, Dec 28, 2020 at 3:14 PM vignesh C <[email protected]> wrote:
>
> Attached is a patch that was used for the same. The patch is written
> on top of the parallel copy patch.
> The design Amit, Andres & myself voted for that is the leader
> identifying the line bound design and sharing it in shared memory is
> performing better.


Hi Hackers, I see following are some of the problem with parallel copy feature:

1) Leader identifying the line/tuple boundaries from the file, letting
the workers pick, insert parallelly vs leader reading the file and
letting workers identify line/tuple boundaries, insert
2) Determining parallel safety of partitioned tables
3) Bulk extension of relation while inserting i.e. adding more than
one extra blocks to the relation in RelationAddExtraBlocks

Please let me know if I'm missing anything.

For (1) - from Vignesh's experiments above, it shows that the " leader
identifying the line/tuple boundaries from the file, letting the
workers pick, insert parallelly" fares better.
For (2) - while it's being discussed in another thread (I'm not sure
what's the status of that thread), how about we take this feature
without the support for partitioned tables i.e. parallel copy is
disabled for partitioned tables? Once the other discussion gets to a
logical end, we can come back and enable parallel copy for partitioned
tables.
For (3) - we need a way to extend or add new blocks fastly - fallocate
might help here, not sure who's working on it, others can comment
better here.

Can we take the "parallel copy" feature forward of course with some
restrictions in place?

Thoughts?

Regards,
Bharath Rupireddy.

Re: Parallel copy

Reply via email to