At Tue, 18 Feb 2020 15:59:36 +0530, Amit Kapila <amit.kapil...@gmail.com> wrote in > On Tue, Feb 18, 2020 at 7:28 AM Kyotaro Horiguchi > <horikyota....@gmail.com> wrote: > > > > In an extreme case, if we didn't see a QUOTE in a chunk, we cannot > > know the chunk is in a quoted section or not, until all the past > > chunks are parsed. After all we are forced to parse fully > > sequentially as far as we allow QUOTE. > > > > Right, I think the benefits of this as compared to single reader idea > would be (a) we can save accessing shared memory for the most part of > the chunk (b) for non-csv mode, even the tokenization (finding line > boundaries) would also be parallel. OTOH, doing processing > differently for csv and non-csv mode might not be good.
Agreed. So I think it's a good point of compromize. > > On the other hand, if we allowed "COPY t FROM f WITH (FORMAT CSV, > > QUOTE '')" in order to signal that there's no quoted section in the > > file then all chunks would be fully concurrently parsable. > > > > Yeah, if we can provide such an option, we can probably make parallel > csv processing equivalent to non-csv. However, users might not like > this as I think in some cases it won't be easier for them to tell > whether the file has quoted fields or not. I am not very sure of this > point. I'm not sure how large portion of the usage contains quoted sections, so I'm not sure how it is useful.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center