Hi,
On Mon, 2 Mar 2026 at 22:55, Nathan Bossart <[email protected]> wrote:
>
> On Wed, Feb 25, 2026 at 05:24:27PM +0300, Nazir Bilal Yavuz wrote:
> > If anyone has any suggestions/ideas, please let me know!
I am able to fix the problem. My first assumption was that the
branching of SIMD code caused that problem, so I moved SIMD code to
the CopyReadLineTextSIMDHelper() function. Then I moved this
CopyReadLineTextSIMDHelper() to top of CopyReadLineText(), by doing
that we won't have any branching in the non-SIMD (scalar) code path.
This didn't solve the problem and then I realized that even though I
disable SIMD code path with 'if (false)', there is still regression
but if I comment all of the 'if (cstate->simd_enabled)' branch, then
there is no regression at all.
To find out more, I compared assembly outputs of both and found out
the possible reason. What I understood is that the compiler can't
promote a variable to register, instead these variables live in the
stack; which is slower. Please see the two different assembly outputs:
Slow code:
c = copy_input_buf[input_buf_ptr++];
db0: 48 8b 55 b8 mov -0x48(%rbp),%rdx
db4: 48 63 c6 movslq %esi,%rax
db7: 44 8d 66 01 lea 0x1(%rsi),%r12d
dbb: 44 89 65 cc mov %r12d,-0x34(%rbp)
dbf: 0f be 14 02 movsbl (%rdx,%rax,1),%edx
Fast code:
c = copy_input_buf[input_buf_ptr++];
d80: 49 63 c4 movslq %r12d,%rax
d83: 45 8d 5c 24 01 lea 0x1(%r12),%r11d
d88: 41 0f be 04 06 movsbl (%r14,%rax,1),%eax
And the reason for that is sending the address of input_buf_ptr to a
CopyReadLineTextSIMDHelper(..., &input_buf_ptr). If I change it to
this:
int temp_input_buf_ptr = input_buf_ptr;
CopyReadLineTextSIMDHelper(..., &temp_input_buf_ptr);
Then there is no regression. However, I am still not completely sure
if that is the same problem in the v10, I am planning to spend more
time debugging this.
> A couple of random ideas:
>
> * Additional inlining for callers. I looked around a little bit and didn't
> see any great candidates, so I don't have much faith in this, but maybe
> you'll see something I don't.
I agree with you. CopyReadLineText() is already quite a big function.
> * Disable SIMD if we are consistently getting small rows. That won't help
> your "wide & CSV 1/3" case in all likelihood, but perhaps it'll help with
> the regression for narrow rows described elsewhere.
I implemented this, two consecutive small rows disables SIMD.
> * Surround the variable initializations with "if (simd_enabled)".
> Presumably compilers are smart enough to remove those in the non-SIMD paths
> already, but it could be worth a try.
Done.
> * Add simd_enabled function parameter to CopyReadLine(),
> NextCopyFromRawFieldsInternal(), and CopyFromTextLikeOneRow(), and do the
> bool literal trick in CopyFrom{Text,CSV}OneRow(). That could encourage the
> compiler to do some additional optimizations to reduce branching.
I think we don't need this. At least the implementation with
CopyReadLineTextSIMDHelper() doesn't need this since branching will be
at the top and it will be once per line.
I think v11 looks better compared to v10. I liked the
CopyReadLineTextSIMDHelper() helper function. I also liked it being at
the top of CopyReadLineText(), not being in the scalar path. This
gives us more optimization options without affecting the scalar path.
Here are the new benchmark results, I benchmarked the changes with
both -O2 and -O3 and also both with and without 'changing
default_toast_compression to lz4' commit (65def42b1d5). Benchmark
results show that there is no regression and the performance
improvement is much bigger with 65def42b1d5, it is close to 2x for
text format and more than 2x for the csv format.
------------------------------
Benchmark results:
With 65def42b1d5:
+---------------------------------------------------------+
| Optimization: -O2 |
+--------------------------+--------------+---------------+
| | Text | CSV |
+--------------------------+------+-------+-------+-------+
| WIDE | None | 1/3 | None | 1/3 |
+--------------------------+------+-------+-------+-------+
| Old Master | 4220 | 4780 | 5930 | 8250 |
+--------------------------+------+-------+-------+-------+
| Old Master + 0001 + 0002 | 2520 | 4500 | 2520 | 7800 |
+--------------------------+------+-------+-------+-------+
| | | | | |
+--------------------------+------+-------+-------+-------+
| | Text | CSV |
+--------------------------+------+-------+-------+-------+
| NARROW | None | 1/3 | None | 1/3 |
+--------------------------+------+-------+-------+-------+
| Old Master | 9920 | 10100 | 10200 | 10470 |
+--------------------------+------+-------+-------+-------+
| Old Master + 0001 + 0002 | 9970 | 10000 | 10180 | 10350 |
+--------------------------+------+-------+-------+-------+
| |
+---------------------------------------------------------+
| |
+---------------------------------------------------------+
| Optimization: -O3 |
+--------------------------+--------------+---------------+
| | Text | CSV |
+--------------------------+------+-------+-------+-------+
| WIDE | None | 1/3 | None | 1/3 |
+--------------------------+------+-------+-------+-------+
| Old Master | 4100 | 4900 | 6200 | 8300 |
+--------------------------+------+-------+-------+-------+
| Old Master + 0001 + 0002 | 2470 | 4440 | 2570 | 7700 |
+--------------------------+------+-------+-------+-------+
| | | | | |
+--------------------------+------+-------+-------+-------+
| | Text | CSV |
+--------------------------+------+-------+-------+-------+
| NARROW | None | 1/3 | None | 1/3 |
+--------------------------+------+-------+-------+-------+
| Old Master | 9530 | 9690 | 9800 | 10080 |
+--------------------------+------+-------+-------+-------+
| Old Master + 0001 + 0002 | 9350 | 9450 | 9700 | 10000 |
+--------------------------+------+-------+-------+-------+
------------------------------
Without 65def42b1d5:
+----------------------------------------------------------+
| Optimization: -O2 |
+--------------------------+---------------+---------------+
| | Text | CSV |
+--------------------------+-------+-------+-------+-------+
| WIDE | None | 1/3 | None | 1/3 |
+--------------------------+-------+-------+-------+-------+
| Old Master | 10550 | 11030 | 12250 | 14400 |
+--------------------------+-------+-------+-------+-------+
| Old Master + 0001 + 0002 | 8890 | 10700 | 8870 | 14070 |
+--------------------------+-------+-------+-------+-------+
| | | | | |
+--------------------------+-------+-------+-------+-------+
| | Text | CSV |
+--------------------------+-------+-------+-------+-------+
| NARROW | None | 1/3 | None | 1/3 |
+--------------------------+-------+-------+-------+-------+
| Old Master | 9921 | 10205 | 10123 | 10420 |
+--------------------------+-------+-------+-------+-------+
| Old Master + 0001 + 0002 | 9880 | 10070 | 10150 | 10400 |
+--------------------------+-------+-------+-------+-------+
| |
+----------------------------------------------------------+
| |
+----------------------------------------------------------+
| Optimization: -O3 |
+--------------------------+---------------+---------------+
| | Text | CSV |
+--------------------------+-------+-------+-------+-------+
| WIDE | None | 1/3 | None | 1/3 |
+--------------------------+-------+-------+-------+-------+
| Old Master | 10500 | 11100 | 12600 | 14580 |
+--------------------------+-------+-------+-------+-------+
| Old Master + 0001 + 0002 | 8900 | 10660 | 8860 | 13990 |
+--------------------------+-------+-------+-------+-------+
| | | | | |
+--------------------------+-------+-------+-------+-------+
| | Text | CSV |
+--------------------------+-------+-------+-------+-------+
| NARROW | None | 1/3 | None | 1/3 |
+--------------------------+-------+-------+-------+-------+
| Old Master | 9600 | 9700 | 9800 | 10150 |
+--------------------------+-------+-------+-------+-------+
| Old Master + 0001 + 0002 | 9300 | 9470 | 9600 | 9880 |
+--------------------------+-------+-------+-------+-------+
--
Regards,
Nazir Bilal Yavuz
Microsoft
From 7acaeb3201ae4ae279bf8b25641bea7f8cb92cbe Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Wed, 4 Mar 2026 17:28:54 +0300
Subject: [PATCH v11] Speed up COPY FROM text/CSV parsing using SIMD
This patch disables SIMD when SIMD encounters a special character which
is neither EOF nor EOL.
Author: Shinya Kato <[email protected]>
Author: Nazir Bilal Yavuz <[email protected]>
Reviewed-by: Kazar Ayoub <[email protected]>
Reviewed-by: Nathan Bossart <[email protected]>
Reviewed-by: Neil Conway <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Reviewed-by: Manni Wood <[email protected]>
Reviewed-by: Mark Wong <[email protected]>
Discussion: https://postgr.es/m/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig%40mail.gmail.com
---
src/backend/commands/copyfrom.c | 4 +
src/backend/commands/copyfromparse.c | 222 ++++++++++++++++++++++-
src/include/commands/copyfrom_internal.h | 4 +
3 files changed, 223 insertions(+), 7 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2f42f55e229..2aa52810ff1 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1747,6 +1747,10 @@ BeginCopyFrom(ParseState *pstate,
cstate->cur_attval = NULL;
cstate->relname_only = false;
+ /* Initialize SIMD */
+ cstate->simd_enabled = true;
+ cstate->simd_failed_first_vector = false;
+
/*
* Allocate buffers for the input pipeline.
*
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index fbd13353efc..70e1a5a0410 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -72,6 +72,7 @@
#include "miscadmin.h"
#include "pgstat.h"
#include "port/pg_bswap.h"
+#include "port/simd.h"
#include "utils/builtins.h"
#include "utils/rel.h"
@@ -158,6 +159,12 @@ static pg_attribute_always_inline bool NextCopyFromRawFieldsInternal(CopyFromSta
int *nfields,
bool is_csv);
+/* SIMD functions */
+#ifndef USE_NO_SIMD
+static bool CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
+ bool *temp_hit_eof, int *temp_input_buf_ptr);
+#endif
+
/* Low-level communications functions */
static int CopyGetData(CopyFromState cstate, void *databuf,
@@ -1310,6 +1317,182 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
return result;
}
+#ifndef USE_NO_SIMD
+/*
+ * Use SIMD instructions to efficiently scan the input buffer for special
+ * characters (e.g., newline, carriage return, quote, and escape). This is
+ * faster than byte-by-byte iteration, especially on large buffers.
+ *
+ * Note that, SIMD may become slower when the input contains many special
+ * characters. To avoid this regression, we disable SIMD for the rest of the
+ * input once we encounter a special character which is neither EOF nor EOL.
+ * Also, SIMD is disabled when it encounters two consecutive short lines that
+ * SIMD can't create a full sized Vector, too.
+ */
+static bool
+CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv, bool *temp_hit_eof, int *temp_input_buf_ptr)
+{
+ char quotec = '\0';
+ char escapec = '\0';
+ char *copy_input_buf;
+ int input_buf_ptr;
+ int copy_buf_len;
+ bool result = false;
+ bool unique_escapec = false;
+ bool first_vector = true;
+ Vector8 nl = vector8_broadcast('\n');
+ Vector8 cr = vector8_broadcast('\r');
+ Vector8 bs = vector8_broadcast('\\');
+ Vector8 quote = vector8_broadcast(0);
+ Vector8 escape = vector8_broadcast(0);
+
+ if (is_csv)
+ {
+ quotec = cstate->opts.quote[0];
+ escapec = cstate->opts.escape[0];
+
+ quote = vector8_broadcast(quotec);
+ if (quotec != escapec)
+ {
+ unique_escapec = true;
+ escape = vector8_broadcast(escapec);
+ }
+ }
+
+ /* For a little extra speed we copy these into local variables */
+ copy_input_buf = cstate->input_buf;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ while (true)
+ {
+ /* Load more data if needed */
+ if (sizeof(Vector8) >= copy_buf_len - input_buf_ptr)
+ {
+ REFILL_LINEBUF;
+
+ CopyLoadInputBuf(cstate);
+ /* update our local variables */
+ *temp_hit_eof = cstate->input_reached_eof;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ /*
+ * If we are completely out of data, break out of the loop,
+ * reporting EOF.
+ */
+ if (INPUT_BUF_BYTES(cstate) <= 0)
+ {
+ result = true;
+ break;
+ }
+ }
+
+ if (copy_buf_len - input_buf_ptr > sizeof(Vector8))
+ {
+ Vector8 chunk;
+ Vector8 match = vector8_broadcast(0);
+
+ /* Load a chunk of data into a vector register */
+ vector8_load(&chunk, (const uint8 *) ©_input_buf[input_buf_ptr]);
+
+ if (is_csv)
+ {
+ match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+ match = vector8_or(match, vector8_eq(chunk, quote));
+ if (unique_escapec)
+ match = vector8_or(match, vector8_eq(chunk, escape));
+ }
+ else
+ {
+ match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+ match = vector8_or(match, vector8_eq(chunk, bs));
+ }
+
+ /* Check if we found any special characters */
+ if (vector8_is_highbit_set(match))
+ {
+ /*
+ * Found a special character. Advance up to that point and let
+ * the scalar code handle it.
+ */
+ uint32 mask;
+ int advance;
+ char c1,
+ c2;
+ bool simd_hit_eol,
+ simd_hit_eof;
+
+ mask = vector8_highbit_mask(match);
+ advance = pg_rightmost_one_pos32(mask);
+
+ input_buf_ptr += advance;
+ c1 = copy_input_buf[input_buf_ptr];
+
+ /*
+ * Since we stopped within the chunk and ((copy_buf_len -
+ * input_buf_ptr) > sizeof(Vector8)) is true,
+ * copy_input_buf[input_buf_ptr + 1] is guaranteed to be
+ * readable.
+ */
+ c2 = copy_input_buf[input_buf_ptr + 1];
+
+ simd_hit_eof = (c1 == '\\' && c2 == '.' && !is_csv);
+ simd_hit_eol = (c1 == '\r' || c1 == '\n');
+
+ /*
+ * Do not disable SIMD when we hit EOL or EOF characters. In
+ * practice, it does not matter for EOF because parsing ends
+ * there, but we keep the behavior consistent.
+ */
+ if (!(simd_hit_eof || simd_hit_eol))
+ cstate->simd_enabled = false;
+
+ /*
+ * We encountered a EOL or EOF on the first vector. This means
+ * lines are not long enough to skip fully sized vector. If
+ * this happens two times consecutively, then disable the
+ * SIMD.
+ */
+ if (first_vector)
+ {
+ if (cstate->simd_failed_first_vector)
+ cstate->simd_enabled = false;
+
+ cstate->simd_failed_first_vector = true;
+ }
+
+ break;
+ }
+ else
+ {
+ /* No special characters found, so skip the entire chunk */
+ input_buf_ptr += sizeof(Vector8);
+ first_vector = false;
+ }
+ }
+
+ /*
+ * Although we refill linebuf, there is not enough character to fill
+ * full sized vector. This doesn't mean that we encountered a line
+ * that is not enough to fill a full sized vector.
+ *
+ * Scalar code will handle the rest for this line. Then, SIMD will
+ * continue from the next line.
+ */
+ else
+ {
+ first_vector = false;
+ break;
+ }
+ }
+
+ cstate->simd_failed_first_vector = first_vector;
+ *temp_input_buf_ptr = input_buf_ptr;
+ return result;
+}
+#endif
+
/*
* CopyReadLineText - inner loop of CopyReadLine for text mode
*/
@@ -1338,6 +1521,38 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
escapec = '\0';
}
+ /* input_buf_ptr will be used in the SIMD Helper function */
+ input_buf_ptr = cstate->input_buf_index;
+
+#ifndef USE_NO_SIMD
+ /* First try to run SIMD, then continue with the scalar path */
+ if (cstate->simd_enabled)
+ {
+ int temp_input_buf_ptr = input_buf_ptr;
+ bool temp_hit_eof = false;
+
+ result = CopyReadLineTextSIMDHelper(cstate, is_csv, &temp_hit_eof,
+ &temp_input_buf_ptr);
+ input_buf_ptr = temp_input_buf_ptr;
+ hit_eof = temp_hit_eof;
+
+ /* Short exit from SIMD */
+ if (result)
+ {
+ /*
+ * Transfer any still-uncopied data to line_buf.
+ */
+ REFILL_LINEBUF;
+
+ return result;
+ }
+ }
+#endif
+
+ /* For a little extra speed we copy these into local variables */
+ copy_input_buf = cstate->input_buf;
+ copy_buf_len = cstate->input_buf_len;
+
/*
* The objective of this loop is to transfer the entire next input line
* into line_buf. Hence, we only care for detecting newlines (\r and/or
@@ -1359,14 +1574,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
* character to examine; any characters from input_buf_index to
* input_buf_ptr have been determined to be part of the line, but not yet
* transferred to line_buf.
- *
- * For a little extra speed within the loop, we copy input_buf and
- * input_buf_len into local variables.
*/
- copy_input_buf = cstate->input_buf;
- input_buf_ptr = cstate->input_buf_index;
- copy_buf_len = cstate->input_buf_len;
-
for (;;)
{
int prev_raw_ptr;
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index f892c343157..4a748df8ac8 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -89,6 +89,10 @@ typedef struct CopyFromStateData
const char *cur_attval; /* current att value for error messages */
bool relname_only; /* don't output line number, att, etc. */
+ /* SIMD variables */
+ bool simd_enabled;
+ bool simd_failed_first_vector;
+
/*
* Working state
*/
--
2.47.3