On 2026-02-08 Su 10:48 PM, jian he wrote:
On Sat, Feb 7, 2026 at 9:27 PM Junwang Zhao<[email protected]> wrote:
Here are some comments on v23:

0001: The refactor looks straightforward to me. Introducing a format
field should make future extensions easier. One suggestion is that we
could add some helper macros around format, for example:

#define IS_FORMAT_CSV(format) (format == COPY_FORMAT_CSV)
#define IS_FORMAT_TEXT_LIKE(format) \
     (format == COPY_FORMAT_TEXT || format == COPY_FORMAT_CSV)

I think this would improve readability.
Personally, I don't like marcos....

0002: Since you have moved the `CopyFormat enum` into 0001, the
following commit msg should be rephrased.

The CopyFormat enum was originally contributed by Joel Jacobson
[email protected], later refactored by Jian He to address various issues, and
further adapted by Junwang Zhao to support the newly introduced CopyToRoutine
struct (commit 2e4127b6d2).

- if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
- errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
+ if (opts_out->delim)
+ {
+ if (opts_out->format == COPY_FORMAT_BINARY)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
+ errmsg("cannot specify %s in BINARY mode", "DELIMITER"));
+ else if (opts_out->format == COPY_FORMAT_JSON)
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify %s in JSON mode", "DELIMITER"));
+ }

Can we add a function that converts CopyFormat to a string? Treating
CopyFormat as %s in error messages would make the code shorter.
However, I'm not sure whether this aligns with translation
conventions, correct me if I'm wrong.

I don’t think this is worth the added complexity.
That said, I tried to simplify the code and changed it to:

     if (opts_out->delim &&
         (opts_out->format == COPY_FORMAT_BINARY ||
          opts_out->format == COPY_FORMAT_JSON))
         ereport(ERROR,
                 errcode(ERRCODE_SYNTAX_ERROR),
                 opts_out->format == COPY_FORMAT_BINARY
                 ? errmsg("cannot specify %s in BINARY mode", "DELIMITER")
                 : errmsg("cannot specify %s in JSON mode", "DELIMITER"));

- * CSV and text formats share the same TextLike routines except for the
+ * CSV and text, json formats share the same TextLike routines except for the

I'd suggest rewording to `CSV, text and json ...`. The same applied to
other parts in this patch.

sure.

0003: The commit message includes some changes(adapt the newly
introduced CopyToRoutine) that actually belong to 0002; it would be
better to remove them from this commit.

0002 commit message:
"""
This introduces the JSON format option for the COPY TO command, allowing users
to export query results or table data directly as a single JSON object or a
stream of JSON objects.

The JSON format is currently supported only for COPY TO operations; it
is not available for COPY FROM.

JSON format is incompatible with some standard text/CSV parsing or
formatting options,
including:
- HEADER
- DEFAULT
- NULL
- DELIMITER
- FORCE QUOTE / FORCE NOT NULL

Regression tests covering valid JSON exports and error handling for
incompatible options have been added to src/test/regress/sql/copy.sql.
"""

0003 commit message:
"""
Add option force_array for COPY JSON FORMAT
This adds the force_array option, which is available exclusively
when using COPY TO with the JSON format.

When enabled, this option wraps the output in a top-level JSON array
(enclosed in square brackets with comma-separated elements), making the
entire result a valid single JSON value. Without this option, the default
behavior is to output a stream of independent JSON objects.

Attempting to use this option with COPY FROM or with formats other than
JSON will raise an error.
"""

+ if (cstate->json_row_delim_needed && cstate->opts.force_array)
+ CopySendChar(cstate, ',');
+ else if (cstate->opts.force_array)
+ {
+ /* first row needs no delimiter */
+ CopySendChar(cstate, ' ');
+ cstate->json_row_delim_needed = true;
+ }

can we do this:

if (cstate->opts.force_array)
{
     if (cstate->json_row_delim_needed)
         CopySendChar(cstate, ',');
     else
     {
         /* first row needs no delimiter */
         CopySendChar(cstate, ' ');
         cstate->json_row_delim_needed = true;
     }
}

good suggestion.

If more people think WRAP_ARRAY is better than FORCE_ARRAY, we can
switch to it accordingly.
The change itself is quite straightforward.


I have reworked these. First I cleaned up a number of things in patches 2 and 3 (Thanks, Claude for the summary):

Patch 2: json format for COPY TO

copy.c:
  - "json" → "JSON" in the COPY FROM rejection error message.

  copyto.c:
  1. TupleDesc setup runs once, not every row — Added json_tupledesc_ready flag; the memcpy/populate_compact_attribute/BlessTupleDesc block is now guarded by if (!cstate->json_tupledesc_ready).   2. Comment rewritten — Old: "the slot's TupleDesc may change during query execution". New: explains BlessTupleDesc registers the RECORDOID descriptor so lookup_rowtype_tupdesc inside composite_to_json can
  find it.
  3. Eliminated per-row makeStringInfo() — Added StringInfoData json_buf to the struct, initialized once in CopyToTextLikeStart in copycontext. Each row does resetStringInfo instead of allocating a new
  StringInfo.
  4. Column list rejection added — Error: "column selection is not supported in JSON mode" when attnamelist != NIL.   5. Improved SendCopyBegin comment — Old: "JSON format is always one non-binary column". New: explains each CopyData message contains one complete JSON object.

  Tests:
  6. Added copy copytest (style) to stdout (format json) to the error-case block.   7. Added copyjsontype table test with json, jsonb columns — verifies values are embedded directly, not double-encoded. Covers json objects, scalars, arrays, nested objects, and nulls.

  Patch 3: Add option force_array

  copy.c:
  1. "json" → "JSON" in COPY FROM error (carried from patch 2).
  2. "can only used" → "can only be used" — grammar fix in FORCE_ARRAY error.

  copyto.c:
  3. Struct fields reorganized — JSON fields grouped under /* JSON format state */ comment with inline descriptions, instead of a standalone json_row_delim_needed with a vague comment.   4. Block comment updated — Was "CSV, text and json formats share the same TextLike routines except for the one-row callback". Now correctly notes JSON has its own one-row and end callbacks.   5. CopyToTextLikeEnd comment fixed — Was "text, CSV, and json", now "text and CSV" (JSON uses CopyToJsonEnd).   6. Cleaner start callback — FORCE_ARRAY bracket emission moved inside the if (format == JSON) block after json_buf init, instead of a separate top-level conditional.   7. Inherits all patch 2 fixes — TupleDesc guard, reusable json_buf, improved comments, column list rejection.

  Tests:
  8. --Error → -- should fail: force_array requires json format; --ok → -- force_array variants.
  9. copyjsontype test carried through from patch 2.

Then I reworked the way this works. In order to support column lists with JSON output, we need to deal with individual columns instead of whole records. This involved quite a number of changes, as can be seen in patch 4. This involved exporting a new small function from json.c.

The result is a lot cleaner, I believe, and in my benchmarking is faster by a factor of almost 2.


cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Reply via email to