Re: Parallel copy

Tomas Vondra Fri, 21 Feb 2020 16:29:04 -0800

On Fri, Feb 21, 2020 at 02:54:31PM +0200, Ants Aasma wrote:

On Thu, 20 Feb 2020 at 18:43, David Fetter <da...@fetter.org> wrote:>

On Thu, Feb 20, 2020 at 02:36:02PM +0100, Tomas Vondra wrote:
> I think the wc2 is showing that maybe instead of parallelizing the
> parsing, we might instead try using a different tokenizer/parser and
> make the implementation more efficient instead of just throwing more
> CPUs on it.


That was what I had in mind.

> I don't know if our code is similar to what wc does, maytbe parsing
> csv is more complicated than what wc does.

CSV parsing differs from wc in that there are more states in the state
machine, but I don't see anything fundamentally different.


The trouble with a state machine based approach is that the state
transitions form a dependency chain, which means that at best the
processing rate will be 4-5 cycles per byte (L1 latency to fetch the
next state).

I whipped together a quick prototype that uses SIMD and bitmap
manipulations to do the equivalent of CopyReadLineText() in csv mode
including quotes and escape handling, this runs at 0.25-0.5 cycles per
byte.


Interesting. How does that compare to what we currently have?


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Parallel copy

Reply via email to