vignesh C wrote: > I noticed that these tests are passing without applying patch too:
> +insert into copytest2(test) values('line1'), ('\.'), ('line2'); > +copy (select test from copytest2 order by test collate "C") to :'filename' > csv; > +-- get the data back in with copy > +truncate copytest2; > +copy copytest2(test) from :'filename' csv; > +select test from copytest2 order by test collate "C"; > > I was not sure if this was intentional. Can we add a test which fails > in HEAD and passes with the patch applied. Thanks for checking this out. Indeed, that was not intentional. I've been using static files in my tests and forgot that if the data was produced with COPY OUT, it would quote backslash-dot so that COPY IN could reload it without problem. PFA an updated version that uses \qecho to produce the data instead of COPY OUT. This test on unpatched HEAD shows that copytest2 is missing 2 rows after COPY IN. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
From 0137963134ac88b009a8e93d6a39cabfaefe43f5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Daniel=20V=C3=A9rit=C3=A9?= <dan...@manitou-mail.org> Date: Tue, 19 Dec 2023 11:49:35 +0100 Subject: [PATCH v2] Support backslash-dot on a line by itself as valid data in COPY FROM...CSV --- doc/src/sgml/ref/psql-ref.sgml | 4 ---- src/backend/commands/copyfromparse.c | 13 ++--------- src/bin/psql/copy.c | 32 ++++++++++++++++++++-------- src/test/regress/expected/copy.out | 18 ++++++++++++++++ src/test/regress/sql/copy.sql | 12 +++++++++++ 5 files changed, 55 insertions(+), 24 deletions(-) diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index cc7d797159..d94e3cacfc 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -1119,10 +1119,6 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value' 'second value' \g destination, because all data must pass through the client/server connection. For large amounts of data the <acronym>SQL</acronym> command might be preferable. - Also, because of this pass-through method, <literal>\copy - ... from</literal> in <acronym>CSV</acronym> mode will erroneously - treat a <literal>\.</literal> data value alone on a line as an - end-of-input marker. </para> </tip> diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index f553734582..b4c291b25b 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -1137,7 +1137,6 @@ CopyReadLineText(CopyFromState cstate) bool result = false; /* CSV variables */ - bool first_char_in_line = true; bool in_quote = false, last_was_esc = false; char quotec = '\0'; @@ -1335,10 +1334,9 @@ CopyReadLineText(CopyFromState cstate) } /* - * In CSV mode, we only recognize \. alone on a line. This is because - * \. is a valid CSV data value. + * In CSV mode, \. is a valid CSV data value anywhere in the line. */ - if (c == '\\' && (!cstate->opts.csv_mode || first_char_in_line)) + if (c == '\\' && !cstate->opts.csv_mode) { char c2; @@ -1442,14 +1440,7 @@ CopyReadLineText(CopyFromState cstate) } } - /* - * This label is for CSV cases where \. appears at the start of a - * line, but there is more text after it, meaning it was a data value. - * We are more strict for \. in CSV mode because \. could be a data - * value, while in non-CSV mode, \. cannot be a data value. - */ not_end_of_copy: - first_char_in_line = false; } /* end of outer loop */ /* diff --git a/src/bin/psql/copy.c b/src/bin/psql/copy.c index dbbbdb8898..f31a7acbb6 100644 --- a/src/bin/psql/copy.c +++ b/src/bin/psql/copy.c @@ -620,20 +620,34 @@ handleCopyIn(PGconn *conn, FILE *copystream, bool isbinary, PGresult **res) /* current line is done? */ if (buf[buflen - 1] == '\n') { - /* check for EOF marker, but not on a partial line */ - if (at_line_begin) + /* + * When at the beginning of the line, check for EOF marker. + * If the marker is found and the data is inlined, + * we must stop at this point. If not, the \. line can be + * sent to the server, and we let it decide whether + * it's an EOF or not depending on the format: in + * basic TEXT, \. is going to be interpreted as an EOF, in + * CSV, it will not. + */ + if (at_line_begin && copystream == pset.cur_cmd_source) { - /* - * This code erroneously assumes '\.' on a line alone - * inside a quoted CSV string terminates the \copy. - * https://www.postgresql.org/message-id/e1tdnvq-0001ju...@wrigleys.postgresql.org - * - * https://www.postgresql.org/message-id/bfcd57e4-8f23-4c3e-a5db-2571d0920...@beta.fastmail.com - */ if ((linelen == 3 && memcmp(fgresult, "\\.\n", 3) == 0) || (linelen == 4 && memcmp(fgresult, "\\.\r\n", 4) == 0)) { copydone = true; + /* + * Remove the EOF marker from the data sent, unless + * using the obsolete v2 protocol that needs it. + * In the case of CSV, the EOF marker must be + * removed, otherwise it would be interpreted by + * the server as valid data. + */ + if (copystream == pset.cur_cmd_source && + PQprotocolVersion(conn) >= 3) + { + *fgresult='\0'; + buflen -= linelen; + } } } diff --git a/src/test/regress/expected/copy.out b/src/test/regress/expected/copy.out index b48365ec98..645e849bbd 100644 --- a/src/test/regress/expected/copy.out +++ b/src/test/regress/expected/copy.out @@ -32,6 +32,24 @@ select * from copytest except select * from copytest2; -------+------+-------- (0 rows) +--- test unquoted \. as data inside CSV +-- do not use copy out to export the data, as it would quote \. +\o :filename +\qecho line1 +\qecho '\\.' +\qecho line2 +\o +-- get the data back in with copy +truncate copytest2; +copy copytest2(test) from :'filename' csv; +select test from copytest2 order by test collate "C"; + test +------- + \. + line1 + line2 +(3 rows) + -- test header line feature create temp table copytest3 ( c1 int, diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql index 43d2e906dd..68a58740cb 100644 --- a/src/test/regress/sql/copy.sql +++ b/src/test/regress/sql/copy.sql @@ -38,6 +38,18 @@ copy copytest2 from :'filename' csv quote '''' escape E'\\'; select * from copytest except select * from copytest2; +--- test unquoted \. as data inside CSV +-- do not use copy out to export the data, as it would quote \. +\o :filename +\qecho line1 +\qecho '\\.' +\qecho line2 +\o +-- get the data back in with copy +truncate copytest2; +copy copytest2(test) from :'filename' csv; +select test from copytest2 order by test collate "C"; + -- test header line feature -- 2.34.1