Re: Optimizing COPY with SIMD

Neil Conway Fri, 07 Jun 2024 11:08:04 -0700

On Wed, Jun 5, 2024 at 3:05 PM Nathan Bossart <[email protected]>
wrote:


> For pg_lfind32(), we ended up using an overlapping approach for the
> vectorized case (see commit 7644a73).  That appeared to help more than it
> harmed in the many (admittedly branch predictor friendly) tests I ran.  I
> wonder if you could do something similar here.
>

I didn't entirely follow what you are suggesting here -- seems like we
would need to do strlen() for the non-SIMD case if we tried to use a
similar approach.

It'd be interesting to see the threshold where your patch starts winning.
> IIUC the vector stuff won't take effect until there are 16 bytes to
> process.  If we don't expect attributes to ordinarily be >= 16 bytes, it
> might be worth trying to mitigate this ~3% regression.  Maybe we can find
> some other small gains elsewhere to offset it.
>

For the particular short-strings benchmark I have been using (3 columns
with 8-character ASCII strings in each), I suspect the regression is caused
by the need to do a strlen(), rather than the vectorized loop itself (we
skip the vectorized loop anyway because sizeof(Vector8) == 16 on this
machine). (This explains why we see a regression on short strings for text
but not CSV: CSV needed to do a strlen() for the non-quoted-string case
regardless). Unfortunately this makes it tricky to make the optimization
conditional on the length of the string. I suppose we could play some games
where we start with a byte-by-byte loop and then switch over to the
vectorized path (and take a strlen()) if we have seen more than, say,
sizeof(Vector8) bytes so far. Seems a bit kludgy though.

I will do some more benchmarking and report back. For the time being, I'm
not inclined to push to get the CopyAttributeOutTextVector() into the tree
in its current state, as I agree that the short-attribute case is quite
important.

In the meantime, attached is a revised patch series. This uses SIMD to
optimize CopyReadLineText in COPY FROM. Performance results:

====
master @ 8fea1bd5411b:

Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-long-strings.sql
  Time (mean ± σ):      1.944 s ±  0.013 s    [User: 0.001 s, System: 0.000
s]
  Range (min … max):    1.927 s …  1.975 s    10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-short-strings.sql
  Time (mean ± σ):      1.021 s ±  0.017 s    [User: 0.002 s, System: 0.001
s]
  Range (min … max):    1.005 s …  1.053 s    10 runs

master + SIMD patches:

Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-long-strings.sql
  Time (mean ± σ):      1.513 s ±  0.022 s    [User: 0.001 s, System: 0.000
s]
  Range (min … max):    1.493 s …  1.552 s    10 runs

Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-short-strings.sql
  Time (mean ± σ):      1.032 s ±  0.032 s    [User: 0.002 s, System: 0.001
s]
  Range (min … max):    1.009 s …  1.113 s    10 runs
====

Neil

v4-0005-Optimize-COPY-TO-in-text-format-using-SIMD.patch
Description: Binary data

v4-0003-Cosmetic-code-cleanup-for-CopyReadLineText.patch
Description: Binary data

v4-0004-Optimize-COPY-TO-in-CSV-format-using-SIMD.patch
Description: Binary data

v4-0002-Improve-COPY-test-coverage-for-handling-of-contro.patch
Description: Binary data

v4-0001-Adjust-misleading-comment-placement.patch
Description: Binary data

v4-0006-Optimize-COPY-FROM-using-SIMD.patch
Description: Binary data

Re: Optimizing COPY with SIMD

Reply via email to