Re: Parallel copy

Kyotaro Horiguchi Tue, 18 Feb 2020 03:01:13 -0800

At Tue, 18 Feb 2020 15:59:36 +0530, Amit Kapila <amit.kapil...@gmail.com> wrote 
in 
> On Tue, Feb 18, 2020 at 7:28 AM Kyotaro Horiguchi
> <horikyota....@gmail.com> wrote:
> >
> > In an extreme case, if we didn't see a QUOTE in a chunk, we cannot
> > know the chunk is in a quoted section or not, until all the past
> > chunks are parsed.  After all we are forced to parse fully
> > sequentially as far as we allow QUOTE.
> >
> 
> Right, I think the benefits of this as compared to single reader idea
> would be (a) we can save accessing shared memory for the most part of
> the chunk (b) for non-csv mode, even the tokenization (finding line
> boundaries) would also be parallel.   OTOH, doing processing
> differently for csv and non-csv mode might not be good.


Agreed. So I think it's a good point of compromize.

> > On the other hand, if we allowed "COPY t FROM f WITH (FORMAT CSV,
> > QUOTE '')" in order to signal that there's no quoted section in the
> > file then all chunks would be fully concurrently parsable.
> >
> 
> Yeah, if we can provide such an option, we can probably make parallel
> csv processing equivalent to non-csv.  However, users might not like
> this as I think in some cases it won't be easier for them to tell
> whether the file has quoted fields or not.  I am not very sure of this
> point.

I'm not sure how large portion of the usage contains quoted sections,
so I'm not sure how it is useful..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Parallel copy

Reply via email to