On Fri, Oct 18, 2024, at 15:52, jian he wrote: > <refsect2> <title>Raw Format</title> is duplicated > <title>Raw Format</title> didn't mention the special handling of > end-of-data marker.
Thanks for reviewing, above fixed. Here is a summary of the changes since v10, thanks to the feedback: Handling of e.g. JSON and other structured text files that could contain newlines, in a seamless way seems important, so therefore the default is no delimiter for the raw format, so that the entire input is read as one data value for COPY FROM, and all column data is concatenated without delimiter for COPY TO. When specifying a delimiter for the raw format, it separates *rows*, and can be a multi-byte string, such as E'\r\n' to handle Windows text files. This has been documented under the DELIMITER option, as well as under the Raw Format section. This also means that HEADER cannot be supported for RAW, since where there is no delimiter, there would be no way to tell when the header line ends. For flexibility when exporting data, there is no restriction on what characters the column data can contain, which has been documented in this way: When using COPY TO with raw format and a specified DELIMITER, there is no check to prevent data values from containing the delimiter string, which could be problematic if it would be needed to import the data preserved using COPY FROM, since a data value containing the delimiter would then be split into two values. If this is a concern, a different format should be used instead. The refactoring is now in a separate first single commit, which seems necessary, to separate the new functionality, from the refactoring. Here are two examples on usage: 1. Example of importing/exporting entire JSON file % cat test.json [ { "id" : 1, "t_test" : "here's a \"string\"" } ] % psql CREATE TABLE t (c jsonb); \COPY t FROM test.json (FORMAT raw); SELECT * FROM t; c ---------------------------------------------- [{"id": 1, "t_test": "here's a \"string\""}] (1 row) \COPY t TO test.json (FORMAT raw); % cat test.json [{"id": 1, "t_test": "here's a \"string\""}]% Note: the ending "%" is just the terminal indicating there is no newline at the end, which is intended, since there is no delimiter specified. 2. Example of importing/exporting JSONL (newline-delimited JSON) % cat log.jsonl {"timestamp": "2024-10-17T09:15:30Z", "level": "INFO"} {"timestamp": "2024-10-17T09:16:10Z", "level": "ERROR"} {"timestamp": "2024-10-17T09:17:45Z", "level": "WARNING"} % psql \COPY t FROM log.jsonl (FORMAT raw, DELIMITER E'\n'); SELECT * FROM t; c ----------------------------------------------------------------------------------------------------- {"level": "INFO", "timestamp": "2024-10-17T09:15:30Z"} {"level": "ERROR", "timestamp": "2024-10-17T09:16:10Z"} {"level": "WARNING", "timestamp": "2024-10-17T09:17:45Z"} (3 rows) \COPY t TO log.jsonl (FORMAT raw, DELIMITER E'\n'); % cat log.jsonl {"level": "INFO", "timestamp": "2024-10-17T09:15:30Z"} {"level": "ERROR", "timestamp": "2024-10-17T09:16:10Z"} {"level": "WARNING", "timestamp": "2024-10-17T09:17:45Z"} > +COPY copy_raw_test (col) FROM :'filename' RAW; > we may need to support this. > since we not allow > COPY x from stdin text; > COPY x to stdout text; > so I think adding the RAW keyword in gram.y may not be necessary. Nice, I didn't know text was not supported either, so then it seems fine to only support the new syntax. RAW keyword and the grammar removed. > /* Complete COPY <sth> FROM|TO filename WITH (FORMAT */ > else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, > "WITH", "(", "FORMAT")) > COMPLETE_WITH("binary", "csv", "text"); > src/bin/psql/tab-complete.in.c, we can also add "raw". Nice, added. > escape option no regress test. Regress tests added for both ESCAPE and QUOTE, checking that they cannot be used with TEXT, RAW nor BINARY. > We already did column length checking at BeginCopyTo. > no need to "if (list_length(cstate->attnumlist) != 1)" error check in > CopyOneRowTo? Hmm, not sure really, since DoCopy() calls both BeginCopyTo() and DoCopyTo() which in turn calls CopyOneRowTo(), but CopyOneRowTo() is also being called from copy_dest_receive(). /Joel
v11-0001-Refactor-ProcessCopyOptions-introduce-CopyFormat-enu.patch
Description: Binary data
v11-0002-Add-raw-format-to-COPY-command.patch
Description: Binary data