On Wed, Jun 5, 2024 at 3:05 PM Nathan Bossart <nathandboss...@gmail.com> wrote:
> For pg_lfind32(), we ended up using an overlapping approach for the > vectorized case (see commit 7644a73). That appeared to help more than it > harmed in the many (admittedly branch predictor friendly) tests I ran. I > wonder if you could do something similar here. > I didn't entirely follow what you are suggesting here -- seems like we would need to do strlen() for the non-SIMD case if we tried to use a similar approach. It'd be interesting to see the threshold where your patch starts winning. > IIUC the vector stuff won't take effect until there are 16 bytes to > process. If we don't expect attributes to ordinarily be >= 16 bytes, it > might be worth trying to mitigate this ~3% regression. Maybe we can find > some other small gains elsewhere to offset it. > For the particular short-strings benchmark I have been using (3 columns with 8-character ASCII strings in each), I suspect the regression is caused by the need to do a strlen(), rather than the vectorized loop itself (we skip the vectorized loop anyway because sizeof(Vector8) == 16 on this machine). (This explains why we see a regression on short strings for text but not CSV: CSV needed to do a strlen() for the non-quoted-string case regardless). Unfortunately this makes it tricky to make the optimization conditional on the length of the string. I suppose we could play some games where we start with a byte-by-byte loop and then switch over to the vectorized path (and take a strlen()) if we have seen more than, say, sizeof(Vector8) bytes so far. Seems a bit kludgy though. I will do some more benchmarking and report back. For the time being, I'm not inclined to push to get the CopyAttributeOutTextVector() into the tree in its current state, as I agree that the short-attribute case is quite important. In the meantime, attached is a revised patch series. This uses SIMD to optimize CopyReadLineText in COPY FROM. Performance results: ==== master @ 8fea1bd5411b: Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-long-strings.sql Time (mean ± σ): 1.944 s ± 0.013 s [User: 0.001 s, System: 0.000 s] Range (min … max): 1.927 s … 1.975 s 10 runs Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-short-strings.sql Time (mean ± σ): 1.021 s ± 0.017 s [User: 0.002 s, System: 0.001 s] Range (min … max): 1.005 s … 1.053 s 10 runs master + SIMD patches: Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-long-strings.sql Time (mean ± σ): 1.513 s ± 0.022 s [User: 0.001 s, System: 0.000 s] Range (min … max): 1.493 s … 1.552 s 10 runs Benchmark 1: ./psql -f /Users/neilconway/copy-from-large-short-strings.sql Time (mean ± σ): 1.032 s ± 0.032 s [User: 0.002 s, System: 0.001 s] Range (min … max): 1.009 s … 1.113 s 10 runs ==== Neil
v4-0005-Optimize-COPY-TO-in-text-format-using-SIMD.patch
Description: Binary data
v4-0003-Cosmetic-code-cleanup-for-CopyReadLineText.patch
Description: Binary data
v4-0004-Optimize-COPY-TO-in-CSV-format-using-SIMD.patch
Description: Binary data
v4-0002-Improve-COPY-test-coverage-for-handling-of-contro.patch
Description: Binary data
v4-0001-Adjust-misleading-comment-placement.patch
Description: Binary data
v4-0006-Optimize-COPY-FROM-using-SIMD.patch
Description: Binary data