Hi, Thank you for working on this!
On Thu, 7 Aug 2025 at 04:49, Shinya Kato <shinya11.k...@gmail.com> wrote: > > Hi hackers, > > I have implemented SIMD optimization for the COPY FROM (FORMAT {csv, > text}) command and observed approximately a 5% performance > improvement. Please see the detailed test results below. I have been working on the same idea. I was not moving input_buf_ptr as far as possible, so I think your approach is better. Also, I did a benchmark on text format. I created a benchmark for line length in a table being from 1 byte to 1 megabyte.The peak improvement is line length being 4096 and the improvement is more than 20% [1], I saw no regression on your patch. > Idea > ==== > The current text/CSV parser processes input byte-by-byte, checking > whether each byte is a special character (\n, \r, quote, escape) or a > regular character, and transitions states in a state machine. This > sequential processing is inefficient and likely causes frequent branch > mispredictions due to the many if statements. > > I thought this problem could be addressed by leveraging SIMD and > vectorized operations for faster processing. > > Implementation Overview > ======================= > 1. Create a vector of special characters (e.g., Vector8 nl = > vector8_broadcast('\n');). > 2. Load the input buffer into a Vector8 variable called chunk. > 3. Perform vectorized operations between chunk and the special > character vectors to check if the buffer contains any special > characters. > 4-1. If no special characters are found, advance the input_buf_ptr by > sizeof(Vector8). > 4-2. If special characters are found, advance the input_buf_ptr as far > as possible, then fall back to the original text/CSV parser for > byte-by-byte processing. > ... > Thought? > I would appreciate feedback on the implementation and any suggestions > for further improvement. I have a couple of ideas that I was working on: --- + * However, SIMD optimization cannot be applied in the following cases: + * - Inside quoted fields, where escape sequences and closing quotes + * require sequential processing to handle correctly. I think you can continue SIMD inside quoted fields. Only important thing is you need to set last_was_esc to false when SIMD skipped the chunk. --- + * - When the remaining buffer size is smaller than the size of a SIMD + * vector register, as SIMD operations require processing data in + * fixed-size chunks. You run SIMD when 'copy_buf_len - input_buf_ptr >= sizeof(Vector8)' but you only call CopyLoadInputBuf() when 'input_buf_ptr >= copy_buf_len || need_data' so basically you need to wait at least the sizeof(Vector8) character to pass for the next SIMD. And in the worst case; if CopyLoadInputBuf() puts one character less than sizeof(Vector8), then you can't ever run SIMD. I think we need to make sure that CopyLoadInputBuf() loads at least the sizeof(Vector8) character to the input_buf so we do not encounter that problem. --- What do you think about adding SIMD to CopyReadAttributesText() and CopyReadAttributesCSV() functions? When I add your SIMD approach to CopyReadAttributesText() function, the improvement on the 4096 byte line length input [1] goes from 20% to 30%. --- I shared my ideas as a Feedback.txt file (.txt to stay off CFBot's radar for this thread). I hope these help, please let me know if you have any questions. -- Regards, Nazir Bilal Yavuz Microsoft
From b13f4cdf134eef5fbecf9ea06f9b1c99890b7c02 Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz <byavu...@gmail.com> Date: Thu, 7 Aug 2025 13:27:34 +0300 Subject: [PATCH] Feedback --- src/backend/commands/copyfromparse.c | 55 ++++++++++++++++++++++++++-- 1 file changed, 51 insertions(+), 4 deletions(-) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 5aba0fa6cb7..dae5c1f698c 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -670,8 +670,12 @@ CopyLoadInputBuf(CopyFromState cstate) /* If we now have some unconverted data, try to convert it */ CopyConvertBuf(cstate); - /* If we now have some more input bytes ready, return them */ - if (INPUT_BUF_BYTES(cstate) > nbytes) + /* + * If we now have at least sizeof(Vector8) input bytes ready, return + * them. This is beneficial for SIMD processing in the + * CopyReadLineText() function. + */ + if (INPUT_BUF_BYTES(cstate) > nbytes + sizeof(Vector8)) return; /* @@ -1322,7 +1326,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv) * unsafe with the old v2 COPY protocol, but we don't support that * anymore. */ - if (input_buf_ptr >= copy_buf_len || need_data) + if (input_buf_ptr + sizeof(Vector8) >= copy_buf_len || need_data) { REFILL_LINEBUF; @@ -1359,7 +1363,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv) * vector register, as SIMD operations require processing data in * fixed-size chunks. */ - if (!in_quote && copy_buf_len - input_buf_ptr >= sizeof(Vector8)) + if (copy_buf_len - input_buf_ptr >= sizeof(Vector8)) { Vector8 chunk; Vector8 match; @@ -1395,6 +1399,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv) { /* No special characters found, so skip the entire chunk */ input_buf_ptr += sizeof(Vector8); + last_was_esc = false; continue; } } @@ -1650,6 +1655,11 @@ CopyReadAttributesText(CopyFromState cstate) char *cur_ptr; char *line_end_ptr; +#ifndef USE_NO_SIMD + Vector8 bs = vector8_broadcast('\\'); + Vector8 delim = vector8_broadcast(delimc);; +#endif + /* * We need a special case for zero-column tables: check that the input * line is empty, and return. @@ -1717,6 +1727,43 @@ CopyReadAttributesText(CopyFromState cstate) { char c; +#ifndef USE_NO_SIMD + if (line_end_ptr - cur_ptr >= sizeof(Vector8)) + { + Vector8 chunk; + Vector8 match; + uint32 mask; + + /* Load a chunk of data into a vector register */ + vector8_load(&chunk, (const uint8 *) cur_ptr); + + /* Create a mask of all special characters we need to stop at */ + match = vector8_or(vector8_eq(chunk, bs), vector8_eq(chunk, delim)); + + /* Check if we found any special characters */ + mask = vector8_highbit_mask(match); + if (mask != 0) + { + /* + * Found a special character. Advance up to that point and let + * the scalar code handle it. + */ + int advance = pg_rightmost_one_pos32(mask); + memcpy(output_ptr, cur_ptr, advance); + output_ptr += advance; + cur_ptr += advance; + } + else + { + /* No special characters found, so skip the entire chunk */ + memcpy(output_ptr, cur_ptr, sizeof(Vector8)); + output_ptr += sizeof(Vector8); + cur_ptr += sizeof(Vector8); + continue; + } + } +#endif + end_ptr = cur_ptr; if (cur_ptr >= line_end_ptr) break; -- 2.50.1