Hi, I've updated the patch set. See the attached v40 patch set.
In <cad21aoaxzwpc7jjpmtct80hnzmpa2sujkiqdyhweey8szsc...@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 23 Apr 2025 23:44:55 -0700, Masahiko Sawada <sawada.m...@gmail.com> wrote: >> Are the followings correct? >> >> 1. Move invalid input patterns in >> src/test/modules/test_copy_format/sql/invalid.sql to >> src/test/regress/sql/copy.sql as much as possible. >> 2. Create >> src/test/modules/test_copy_format/sql/test_copy_format.sql >> and move all contents in existing *.sql to the file. >> 3. Add comments what the tests expect to >> src/test/modules/test_copy_format/sql/test_copy_format.sql. >> 4. Remove CopyFormatOptions::{binary,csv_mode}. > > Agreed with the above items. Done except 1. because 1. is removed by 3. in the following list: ---- >> There are 3 unconfirmed suggested changes for tests in: >> https://www.postgresql.org/message-id/20250330.113126.433742864258096312.kou%40clear-code.com >> >> Here are my opinions for them: >> >> > 1.: There is no difference between single-quoting and >> > double-quoting here. Because the information what quote >> > was used for the given FORMAT value isn't remained >> > here. Should we update gram.y? >> > >> > 2.: I don't have a strong opinion for it. If nobody objects >> > it, I'll remove them. >> > >> > 3.: I don't have a strong opinion for it. If nobody objects >> > it, I'll remove them. ---- 0005 is added for 4. Could you squash 0004 ("Use copy handler for bult-in formats") and 0005 ("Remove CopyFormatOptions::{binary,csv_mode}") if needed when you push? >> 6. Use handler OID for detecting the default built-in format >> instead of comparing the given format as string. Done. >> 7. Update documentation. Could someone help this? 0007 is the draft commit for this. >> There are 3 unconfirmed suggested changes for tests in: >> https://www.postgresql.org/message-id/20250330.113126.433742864258096312.kou%40clear-code.com >> >> Here are my opinions for them: >> >> > 1.: There is no difference between single-quoting and >> > double-quoting here. Because the information what quote >> > was used for the given FORMAT value isn't remained >> > here. Should we update gram.y? >> > >> > 2.: I don't have a strong opinion for it. If nobody objects >> > it, I'll remove them. >> > >> > 3.: I don't have a strong opinion for it. If nobody objects >> > it, I'll remove them. >> >> Is the 1. required for "ready for merge"? If so, is there >> any suggestion? I don't have a strong opinion for it. >> >> If there are no more opinions for 2. and 3., I'll remove >> them. > > Agreed. 1.: I didn't do anything. Because there is no suggestion. 2., 3.: Done. > I think we would still need some rounds of reviews but the patch is > getting in good shape. I hope that this is completed in this year... Thanks, -- kou
>From a81eb07a4c92b8b34ed6fbe6610c54bb9b3bb2e4 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <k...@clear-code.com> Date: Mon, 25 Nov 2024 13:58:33 +0900 Subject: [PATCH v40 1/6] Export CopyDest as private data This is a preparation to export CopyToStateData as private data. CopyToStateData depends on CopyDest. So we need to export CopyDest too. But CopyDest and CopySource has the same names. So we can't export CopyDest as-is. This uses the COPY_DEST_ prefix for CopyDest enum values. CopySource uses the COPY_FROM_ prefix for consistency. --- src/backend/commands/copyfrom.c | 4 ++-- src/backend/commands/copyfromparse.c | 10 ++++---- src/backend/commands/copyto.c | 30 ++++++++---------------- src/include/commands/copyfrom_internal.h | 8 +++---- src/include/commands/copyto_internal.h | 28 ++++++++++++++++++++++ 5 files changed, 49 insertions(+), 31 deletions(-) create mode 100644 src/include/commands/copyto_internal.h diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index fbbbc09a97b..b4dad744547 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1709,7 +1709,7 @@ BeginCopyFrom(ParseState *pstate, pg_encoding_to_char(GetDatabaseEncoding())))); } - cstate->copy_src = COPY_FILE; /* default */ + cstate->copy_src = COPY_SOURCE_FILE; /* default */ cstate->whereClause = whereClause; @@ -1837,7 +1837,7 @@ BeginCopyFrom(ParseState *pstate, if (data_source_cb) { progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK; - cstate->copy_src = COPY_CALLBACK; + cstate->copy_src = COPY_SOURCE_CALLBACK; cstate->data_source_cb = data_source_cb; } else if (pipe) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index f5fc346e201..9f7171d1478 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -180,7 +180,7 @@ ReceiveCopyBegin(CopyFromState cstate) for (i = 0; i < natts; i++) pq_sendint16(&buf, format); /* per-column formats */ pq_endmessage(&buf); - cstate->copy_src = COPY_FRONTEND; + cstate->copy_src = COPY_SOURCE_FRONTEND; cstate->fe_msgbuf = makeStringInfo(); /* We *must* flush here to ensure FE knows it can send. */ pq_flush(); @@ -248,7 +248,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) switch (cstate->copy_src) { - case COPY_FILE: + case COPY_SOURCE_FILE: bytesread = fread(databuf, 1, maxread, cstate->copy_file); if (ferror(cstate->copy_file)) ereport(ERROR, @@ -257,7 +257,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) if (bytesread == 0) cstate->raw_reached_eof = true; break; - case COPY_FRONTEND: + case COPY_SOURCE_FRONTEND: while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof) { int avail; @@ -340,7 +340,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) bytesread += avail; } break; - case COPY_CALLBACK: + case COPY_SOURCE_CALLBACK: bytesread = cstate->data_source_cb(databuf, minread, maxread); break; } @@ -1172,7 +1172,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv) * after \. up to the protocol end of copy data. (XXX maybe better * not to treat \. as special?) */ - if (cstate->copy_src == COPY_FRONTEND) + if (cstate->copy_src == COPY_SOURCE_FRONTEND) { int inbytes; diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index f87e405351d..d739826afbc 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -20,6 +20,7 @@ #include "access/tableam.h" #include "commands/copyapi.h" +#include "commands/copyto_internal.h" #include "commands/progress.h" #include "executor/execdesc.h" #include "executor/executor.h" @@ -36,17 +37,6 @@ #include "utils/rel.h" #include "utils/snapmgr.h" -/* - * Represents the different dest cases we need to worry about at - * the bottom level - */ -typedef enum CopyDest -{ - COPY_FILE, /* to file (or a piped program) */ - COPY_FRONTEND, /* to frontend */ - COPY_CALLBACK, /* to callback function */ -} CopyDest; - /* * This struct contains all the state variables used throughout a COPY TO * operation. @@ -69,7 +59,7 @@ typedef struct CopyToStateData /* low-level state data */ CopyDest copy_dest; /* type of copy source/destination */ - FILE *copy_file; /* used if copy_dest == COPY_FILE */ + FILE *copy_file; /* used if copy_dest == COPY_DEST_FILE */ StringInfo fe_msgbuf; /* used for all dests during COPY TO */ int file_encoding; /* file or remote side's character encoding */ @@ -401,7 +391,7 @@ SendCopyBegin(CopyToState cstate) for (i = 0; i < natts; i++) pq_sendint16(&buf, format); /* per-column formats */ pq_endmessage(&buf); - cstate->copy_dest = COPY_FRONTEND; + cstate->copy_dest = COPY_DEST_FRONTEND; } static void @@ -448,7 +438,7 @@ CopySendEndOfRow(CopyToState cstate) switch (cstate->copy_dest) { - case COPY_FILE: + case COPY_DEST_FILE: if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1, cstate->copy_file) != 1 || ferror(cstate->copy_file)) @@ -482,11 +472,11 @@ CopySendEndOfRow(CopyToState cstate) errmsg("could not write to COPY file: %m"))); } break; - case COPY_FRONTEND: + case COPY_DEST_FRONTEND: /* Dump the accumulated row as one CopyData message */ (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len); break; - case COPY_CALLBACK: + case COPY_DEST_CALLBACK: cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len); break; } @@ -507,7 +497,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate) { switch (cstate->copy_dest) { - case COPY_FILE: + case COPY_DEST_FILE: /* Default line termination depends on platform */ #ifndef WIN32 CopySendChar(cstate, '\n'); @@ -515,7 +505,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate) CopySendString(cstate, "\r\n"); #endif break; - case COPY_FRONTEND: + case COPY_DEST_FRONTEND: /* The FE/BE protocol uses \n as newline for all platforms */ CopySendChar(cstate, '\n'); break; @@ -903,12 +893,12 @@ BeginCopyTo(ParseState *pstate, /* See Multibyte encoding comment above */ cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding); - cstate->copy_dest = COPY_FILE; /* default */ + cstate->copy_dest = COPY_DEST_FILE; /* default */ if (data_dest_cb) { progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK; - cstate->copy_dest = COPY_CALLBACK; + cstate->copy_dest = COPY_DEST_CALLBACK; cstate->data_dest_cb = data_dest_cb; } else if (pipe) diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index c8b22af22d8..24157e11a73 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -24,9 +24,9 @@ */ typedef enum CopySource { - COPY_FILE, /* from file (or a piped program) */ - COPY_FRONTEND, /* from frontend */ - COPY_CALLBACK, /* from callback function */ + COPY_SOURCE_FILE, /* from file (or a piped program) */ + COPY_SOURCE_FRONTEND, /* from frontend */ + COPY_SOURCE_CALLBACK, /* from callback function */ } CopySource; /* @@ -64,7 +64,7 @@ typedef struct CopyFromStateData /* low-level state data */ CopySource copy_src; /* type of copy source */ FILE *copy_file; /* used if copy_src == COPY_FILE */ - StringInfo fe_msgbuf; /* used if copy_src == COPY_FRONTEND */ + StringInfo fe_msgbuf; /* used if copy_src == COPY_SOURCE_FRONTEND */ EolType eol_type; /* EOL type of input */ int file_encoding; /* file or remote side's character encoding */ diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h new file mode 100644 index 00000000000..42ddb37a8a2 --- /dev/null +++ b/src/include/commands/copyto_internal.h @@ -0,0 +1,28 @@ +/*------------------------------------------------------------------------- + * + * copyto_internal.h + * Internal definitions for COPY TO command. + * + * + * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/include/commands/copyto_internal.h + * + *------------------------------------------------------------------------- + */ +#ifndef COPYTO_INTERNAL_H +#define COPYTO_INTERNAL_H + +/* + * Represents the different dest cases we need to worry about at + * the bottom level + */ +typedef enum CopyDest +{ + COPY_DEST_FILE, /* to file (or a piped program) */ + COPY_DEST_FRONTEND, /* to frontend */ + COPY_DEST_CALLBACK, /* to callback function */ +} CopyDest; + +#endif /* COPYTO_INTERNAL_H */ -- 2.47.2
>From 398994b555e3b508ce26fc33199bf9badbfc82d5 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <k...@clear-code.com> Date: Thu, 27 Mar 2025 11:14:43 +0900 Subject: [PATCH v40 2/6] Add support for adding custom COPY format This uses the handler approach like tablesample. The approach creates an internal function that returns an internal struct. In this case, a handler returns a CopyToRoutine for COPY TO and a CopyFromRoutine for COPY FROM. Whether COPY TO or COPY FROM is passed as the "is_from" argument: copy_handler(true) returns CopyToRoutine copy_handler(false) returns CopyFromRoutine This also add a test module for custom COPY handler. --- src/backend/commands/copy.c | 31 ++++- src/backend/commands/copyfrom.c | 20 +++- src/backend/commands/copyto.c | 72 +++-------- src/backend/nodes/Makefile | 1 + src/backend/nodes/gen_node_support.pl | 2 + src/backend/utils/adt/pseudotypes.c | 1 + src/include/catalog/pg_proc.dat | 6 + src/include/catalog/pg_type.dat | 6 + src/include/commands/copy.h | 3 +- src/include/commands/copyapi.h | 4 + src/include/commands/copyto_internal.h | 55 +++++++++ src/include/nodes/meson.build | 1 + src/test/modules/Makefile | 1 + src/test/modules/meson.build | 1 + src/test/modules/test_copy_format/.gitignore | 4 + src/test/modules/test_copy_format/Makefile | 23 ++++ .../expected/test_copy_format.out | 107 +++++++++++++++++ src/test/modules/test_copy_format/meson.build | 33 +++++ .../test_copy_format/sql/test_copy_format.sql | 52 ++++++++ .../test_copy_format--1.0.sql | 24 ++++ .../test_copy_format/test_copy_format.c | 113 ++++++++++++++++++ .../test_copy_format/test_copy_format.control | 4 + 22 files changed, 505 insertions(+), 59 deletions(-) mode change 100644 => 100755 src/backend/nodes/gen_node_support.pl create mode 100644 src/test/modules/test_copy_format/.gitignore create mode 100644 src/test/modules/test_copy_format/Makefile create mode 100644 src/test/modules/test_copy_format/expected/test_copy_format.out create mode 100644 src/test/modules/test_copy_format/meson.build create mode 100644 src/test/modules/test_copy_format/sql/test_copy_format.sql create mode 100644 src/test/modules/test_copy_format/test_copy_format--1.0.sql create mode 100644 src/test/modules/test_copy_format/test_copy_format.c create mode 100644 src/test/modules/test_copy_format/test_copy_format.control diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 74ae42b19a7..9515c4d5786 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -32,10 +32,12 @@ #include "parser/parse_coerce.h" #include "parser/parse_collate.h" #include "parser/parse_expr.h" +#include "parser/parse_func.h" #include "parser/parse_relation.h" #include "utils/acl.h" #include "utils/builtins.h" #include "utils/lsyscache.h" +#include "utils/regproc.h" #include "utils/rel.h" #include "utils/rls.h" @@ -531,10 +533,31 @@ ProcessCopyOptions(ParseState *pstate, else if (strcmp(fmt, "binary") == 0) opts_out->binary = true; else - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("COPY format \"%s\" not recognized", fmt), - parser_errposition(pstate, defel->location))); + { + List *qualified_format; + Oid arg_types[1]; + Oid handler = InvalidOid; + + qualified_format = stringToQualifiedNameList(fmt, NULL); + arg_types[0] = INTERNALOID; + handler = LookupFuncName(qualified_format, 1, + arg_types, true); + if (!OidIsValid(handler)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY format \"%s\" not recognized", fmt), + parser_errposition(pstate, defel->location))); + + /* check that handler has correct return type */ + if (get_func_rettype(handler) != COPY_HANDLEROID) + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("function %s must return type %s", + fmt, "copy_handler"), + parser_errposition(pstate, defel->location))); + + opts_out->handler = handler; + } } else if (strcmp(defel->defname, "freeze") == 0) { diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index b4dad744547..3d86e8a8328 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -129,6 +129,7 @@ static void CopyFromBinaryEnd(CopyFromState cstate); /* text format */ static const CopyFromRoutine CopyFromRoutineText = { + .type = T_CopyFromRoutine, .CopyFromInFunc = CopyFromTextLikeInFunc, .CopyFromStart = CopyFromTextLikeStart, .CopyFromOneRow = CopyFromTextOneRow, @@ -137,6 +138,7 @@ static const CopyFromRoutine CopyFromRoutineText = { /* CSV format */ static const CopyFromRoutine CopyFromRoutineCSV = { + .type = T_CopyFromRoutine, .CopyFromInFunc = CopyFromTextLikeInFunc, .CopyFromStart = CopyFromTextLikeStart, .CopyFromOneRow = CopyFromCSVOneRow, @@ -145,6 +147,7 @@ static const CopyFromRoutine CopyFromRoutineCSV = { /* binary format */ static const CopyFromRoutine CopyFromRoutineBinary = { + .type = T_CopyFromRoutine, .CopyFromInFunc = CopyFromBinaryInFunc, .CopyFromStart = CopyFromBinaryStart, .CopyFromOneRow = CopyFromBinaryOneRow, @@ -155,7 +158,22 @@ static const CopyFromRoutine CopyFromRoutineBinary = { static const CopyFromRoutine * CopyFromGetRoutine(const CopyFormatOptions *opts) { - if (opts->csv_mode) + if (OidIsValid(opts->handler)) + { + Datum datum; + Node *routine; + + datum = OidFunctionCall1(opts->handler, BoolGetDatum(true)); + routine = (Node *) DatumGetPointer(datum); + if (routine == NULL || !IsA(routine, CopyFromRoutine)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function %s.%s did not return CopyFromRoutine struct", + get_namespace_name(get_func_namespace(opts->handler)), + get_func_name(opts->handler)))); + return castNode(CopyFromRoutine, routine); + } + else if (opts->csv_mode) return &CopyFromRoutineCSV; else if (opts->binary) return &CopyFromRoutineBinary; diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index d739826afbc..265b847e255 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -22,9 +22,7 @@ #include "commands/copyapi.h" #include "commands/copyto_internal.h" #include "commands/progress.h" -#include "executor/execdesc.h" #include "executor/executor.h" -#include "executor/tuptable.h" #include "libpq/libpq.h" #include "libpq/pqformat.h" #include "mb/pg_wchar.h" @@ -37,56 +35,6 @@ #include "utils/rel.h" #include "utils/snapmgr.h" -/* - * This struct contains all the state variables used throughout a COPY TO - * operation. - * - * Multi-byte encodings: all supported client-side encodings encode multi-byte - * characters by having the first byte's high bit set. Subsequent bytes of the - * character can have the high bit not set. When scanning data in such an - * encoding to look for a match to a single-byte (ie ASCII) character, we must - * use the full pg_encoding_mblen() machinery to skip over multibyte - * characters, else we might find a false match to a trailing byte. In - * supported server encodings, there is no possibility of a false match, and - * it's faster to make useless comparisons to trailing bytes than it is to - * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true - * when we have to do it the hard way. - */ -typedef struct CopyToStateData -{ - /* format-specific routines */ - const CopyToRoutine *routine; - - /* low-level state data */ - CopyDest copy_dest; /* type of copy source/destination */ - FILE *copy_file; /* used if copy_dest == COPY_DEST_FILE */ - StringInfo fe_msgbuf; /* used for all dests during COPY TO */ - - int file_encoding; /* file or remote side's character encoding */ - bool need_transcoding; /* file encoding diff from server? */ - bool encoding_embeds_ascii; /* ASCII can be non-first byte? */ - - /* parameters from the COPY command */ - Relation rel; /* relation to copy to */ - QueryDesc *queryDesc; /* executable query to copy from */ - List *attnumlist; /* integer list of attnums to copy */ - char *filename; /* filename, or NULL for STDOUT */ - bool is_program; /* is 'filename' a program to popen? */ - copy_data_dest_cb data_dest_cb; /* function for writing data */ - - CopyFormatOptions opts; - Node *whereClause; /* WHERE condition (or NULL) */ - - /* - * Working state - */ - MemoryContext copycontext; /* per-copy execution context */ - - FmgrInfo *out_functions; /* lookup info for output functions */ - MemoryContext rowcontext; /* per-row evaluation context */ - uint64 bytes_processed; /* number of bytes processed so far */ -} CopyToStateData; - /* DestReceiver for COPY (query) TO */ typedef struct { @@ -140,6 +88,7 @@ static void CopySendInt16(CopyToState cstate, int16 val); /* text format */ static const CopyToRoutine CopyToRoutineText = { + .type = T_CopyToRoutine, .CopyToStart = CopyToTextLikeStart, .CopyToOutFunc = CopyToTextLikeOutFunc, .CopyToOneRow = CopyToTextOneRow, @@ -148,6 +97,7 @@ static const CopyToRoutine CopyToRoutineText = { /* CSV format */ static const CopyToRoutine CopyToRoutineCSV = { + .type = T_CopyToRoutine, .CopyToStart = CopyToTextLikeStart, .CopyToOutFunc = CopyToTextLikeOutFunc, .CopyToOneRow = CopyToCSVOneRow, @@ -156,6 +106,7 @@ static const CopyToRoutine CopyToRoutineCSV = { /* binary format */ static const CopyToRoutine CopyToRoutineBinary = { + .type = T_CopyToRoutine, .CopyToStart = CopyToBinaryStart, .CopyToOutFunc = CopyToBinaryOutFunc, .CopyToOneRow = CopyToBinaryOneRow, @@ -166,7 +117,22 @@ static const CopyToRoutine CopyToRoutineBinary = { static const CopyToRoutine * CopyToGetRoutine(const CopyFormatOptions *opts) { - if (opts->csv_mode) + if (OidIsValid(opts->handler)) + { + Datum datum; + Node *routine; + + datum = OidFunctionCall1(opts->handler, BoolGetDatum(false)); + routine = (Node *) DatumGetPointer(datum); + if (routine == NULL || !IsA(routine, CopyToRoutine)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function %s.%s did not return CopyToRoutine struct", + get_namespace_name(get_func_namespace(opts->handler)), + get_func_name(opts->handler)))); + return castNode(CopyToRoutine, routine); + } + else if (opts->csv_mode) return &CopyToRoutineCSV; else if (opts->binary) return &CopyToRoutineBinary; diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile index 77ddb9ca53f..dc6c1087361 100644 --- a/src/backend/nodes/Makefile +++ b/src/backend/nodes/Makefile @@ -50,6 +50,7 @@ node_headers = \ access/sdir.h \ access/tableam.h \ access/tsmapi.h \ + commands/copyapi.h \ commands/event_trigger.h \ commands/trigger.h \ executor/tuptable.h \ diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl old mode 100644 new mode 100755 index 77659b0f760..d688bbea3a0 --- a/src/backend/nodes/gen_node_support.pl +++ b/src/backend/nodes/gen_node_support.pl @@ -62,6 +62,7 @@ my @all_input_files = qw( access/sdir.h access/tableam.h access/tsmapi.h + commands/copyapi.h commands/event_trigger.h commands/trigger.h executor/tuptable.h @@ -86,6 +87,7 @@ my @nodetag_only_files = qw( access/sdir.h access/tableam.h access/tsmapi.h + commands/copyapi.h commands/event_trigger.h commands/trigger.h executor/tuptable.h diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c index 317a1f2b282..f2ebc21ca56 100644 --- a/src/backend/utils/adt/pseudotypes.c +++ b/src/backend/utils/adt/pseudotypes.c @@ -370,6 +370,7 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(fdw_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(table_am_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(index_am_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(tsm_handler); +PSEUDOTYPE_DUMMY_IO_FUNCS(copy_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(internal); PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement); PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray); diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 62beb71da28..ba46bfa48a8 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -7888,6 +7888,12 @@ { oid => '3312', descr => 'I/O', proname => 'tsm_handler_out', prorettype => 'cstring', proargtypes => 'tsm_handler', prosrc => 'tsm_handler_out' }, +{ oid => '8753', descr => 'I/O', + proname => 'copy_handler_in', proisstrict => 'f', prorettype => 'copy_handler', + proargtypes => 'cstring', prosrc => 'copy_handler_in' }, +{ oid => '8754', descr => 'I/O', + proname => 'copy_handler_out', prorettype => 'cstring', + proargtypes => 'copy_handler', prosrc => 'copy_handler_out' }, { oid => '267', descr => 'I/O', proname => 'table_am_handler_in', proisstrict => 'f', prorettype => 'table_am_handler', proargtypes => 'cstring', diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat index 6dca77e0a22..bddf9fb4fbe 100644 --- a/src/include/catalog/pg_type.dat +++ b/src/include/catalog/pg_type.dat @@ -633,6 +633,12 @@ typcategory => 'P', typinput => 'tsm_handler_in', typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-', typalign => 'i' }, +{ oid => '8752', + descr => 'pseudo-type for the result of a COPY TO/FROM handler function', + typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p', + typcategory => 'P', typinput => 'copy_handler_in', + typoutput => 'copy_handler_out', typreceive => '-', typsend => '-', + typalign => 'i' }, { oid => '269', descr => 'pseudo-type for the result of a table AM handler function', typname => 'table_am_handler', typlen => '4', typbyval => 't', typtype => 'p', diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 06dfdfef721..6df1f8a3b9b 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -87,9 +87,10 @@ typedef struct CopyFormatOptions CopyLogVerbosityChoice log_verbosity; /* verbosity of logged messages */ int64 reject_limit; /* maximum tolerable number of errors */ List *convert_select; /* list of column names (can be NIL) */ + Oid handler; /* handler function for custom format routine */ } CopyFormatOptions; -/* These are private in commands/copy[from|to].c */ +/* These are private in commands/copy[from|to]_internal.h */ typedef struct CopyFromStateData *CopyFromState; typedef struct CopyToStateData *CopyToState; diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 2a2d2f9876b..53ad3337f86 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -22,6 +22,8 @@ */ typedef struct CopyToRoutine { + NodeTag type; + /* * Set output function information. This callback is called once at the * beginning of COPY TO. @@ -60,6 +62,8 @@ typedef struct CopyToRoutine */ typedef struct CopyFromRoutine { + NodeTag type; + /* * Set input function information. This callback is called once at the * beginning of COPY FROM. diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h index 42ddb37a8a2..da796131988 100644 --- a/src/include/commands/copyto_internal.h +++ b/src/include/commands/copyto_internal.h @@ -14,6 +14,11 @@ #ifndef COPYTO_INTERNAL_H #define COPYTO_INTERNAL_H +#include "commands/copy.h" +#include "executor/execdesc.h" +#include "executor/tuptable.h" +#include "nodes/execnodes.h" + /* * Represents the different dest cases we need to worry about at * the bottom level @@ -25,4 +30,54 @@ typedef enum CopyDest COPY_DEST_CALLBACK, /* to callback function */ } CopyDest; +/* + * This struct contains all the state variables used throughout a COPY TO + * operation. + * + * Multi-byte encodings: all supported client-side encodings encode multi-byte + * characters by having the first byte's high bit set. Subsequent bytes of the + * character can have the high bit not set. When scanning data in such an + * encoding to look for a match to a single-byte (ie ASCII) character, we must + * use the full pg_encoding_mblen() machinery to skip over multibyte + * characters, else we might find a false match to a trailing byte. In + * supported server encodings, there is no possibility of a false match, and + * it's faster to make useless comparisons to trailing bytes than it is to + * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true + * when we have to do it the hard way. + */ +typedef struct CopyToStateData +{ + /* format-specific routines */ + const CopyToRoutine *routine; + + /* low-level state data */ + CopyDest copy_dest; /* type of copy source/destination */ + FILE *copy_file; /* used if copy_dest == COPY_DEST_FILE */ + StringInfo fe_msgbuf; /* used for all dests during COPY TO */ + + int file_encoding; /* file or remote side's character encoding */ + bool need_transcoding; /* file encoding diff from server? */ + bool encoding_embeds_ascii; /* ASCII can be non-first byte? */ + + /* parameters from the COPY command */ + Relation rel; /* relation to copy to */ + QueryDesc *queryDesc; /* executable query to copy from */ + List *attnumlist; /* integer list of attnums to copy */ + char *filename; /* filename, or NULL for STDOUT */ + bool is_program; /* is 'filename' a program to popen? */ + copy_data_dest_cb data_dest_cb; /* function for writing data */ + + CopyFormatOptions opts; + Node *whereClause; /* WHERE condition (or NULL) */ + + /* + * Working state + */ + MemoryContext copycontext; /* per-copy execution context */ + + FmgrInfo *out_functions; /* lookup info for output functions */ + MemoryContext rowcontext; /* per-row evaluation context */ + uint64 bytes_processed; /* number of bytes processed so far */ +} CopyToStateData; + #endif /* COPYTO_INTERNAL_H */ diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build index d1ca24dd32f..96e70e7f38b 100644 --- a/src/include/nodes/meson.build +++ b/src/include/nodes/meson.build @@ -12,6 +12,7 @@ node_support_input_i = [ 'access/sdir.h', 'access/tableam.h', 'access/tsmapi.h', + 'commands/copyapi.h', 'commands/event_trigger.h', 'commands/trigger.h', 'executor/tuptable.h', diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile index aa1d27bbed3..9bf5d58cdae 100644 --- a/src/test/modules/Makefile +++ b/src/test/modules/Makefile @@ -17,6 +17,7 @@ SUBDIRS = \ test_aio \ test_bloomfilter \ test_copy_callbacks \ + test_copy_format \ test_custom_rmgrs \ test_ddl_deparse \ test_dsa \ diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build index 9de0057bd1d..5fd06de2737 100644 --- a/src/test/modules/meson.build +++ b/src/test/modules/meson.build @@ -16,6 +16,7 @@ subdir('ssl_passphrase_callback') subdir('test_aio') subdir('test_bloomfilter') subdir('test_copy_callbacks') +subdir('test_copy_format') subdir('test_custom_rmgrs') subdir('test_ddl_deparse') subdir('test_dsa') diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore new file mode 100644 index 00000000000..5dcb3ff9723 --- /dev/null +++ b/src/test/modules/test_copy_format/.gitignore @@ -0,0 +1,4 @@ +# Generated subdirectories +/log/ +/results/ +/tmp_check/ diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile new file mode 100644 index 00000000000..8497f91624d --- /dev/null +++ b/src/test/modules/test_copy_format/Makefile @@ -0,0 +1,23 @@ +# src/test/modules/test_copy_format/Makefile + +MODULE_big = test_copy_format +OBJS = \ + $(WIN32RES) \ + test_copy_format.o +PGFILEDESC = "test_copy_format - test custom COPY FORMAT" + +EXTENSION = test_copy_format +DATA = test_copy_format--1.0.sql + +REGRESS = test_copy_format + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = src/test/modules/test_copy_format +top_builddir = ../../../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out new file mode 100644 index 00000000000..3916b766615 --- /dev/null +++ b/src/test/modules/test_copy_format/expected/test_copy_format.out @@ -0,0 +1,107 @@ +CREATE TABLE copy_data (a smallint, b integer, c bigint); +INSERT INTO copy_data VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +-- No WITH SCHEMA. It installs custom COPY handlers to the current +-- schema. +CREATE EXTENSION test_copy_format; +-- We can find a custom COPY handler without schema. +COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format'); +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: attribute: smallint +NOTICE: CopyFromInFunc: attribute: integer +NOTICE: CopyFromInFunc: attribute: bigint +NOTICE: CopyFromStart: the number of attributes: 3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromEnd +COPY copy_data TO stdout WITH (FORMAT 'test_copy_format'); +NOTICE: test_copy_format: is_from=false +NOTICE: CopyToOutFunc: attribute: smallint +NOTICE: CopyToOutFunc: attribute: integer +NOTICE: CopyToOutFunc: attribute: bigint +NOTICE: CopyToStart: the number of attributes: 3 +NOTICE: CopyToOneRow: the number of valid values: 3 +NOTICE: CopyToOneRow: the number of valid values: 3 +NOTICE: CopyToOneRow: the number of valid values: 3 +NOTICE: CopyToEnd +DROP EXTENSION test_copy_format; +-- Install custom COPY handlers to a schema that isn't included in +-- search_path. +CREATE SCHEMA test_schema; +CREATE EXTENSION test_copy_format WITH SCHEMA test_schema; +-- We can find a custom COPY handler by qualified name. +COPY copy_data FROM stdin WITH (FORMAT 'test_schema.test_copy_format'); +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: attribute: smallint +NOTICE: CopyFromInFunc: attribute: integer +NOTICE: CopyFromInFunc: attribute: bigint +NOTICE: CopyFromStart: the number of attributes: 3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromEnd +COPY copy_data TO stdout WITH (FORMAT 'test_schema.test_copy_format'); +NOTICE: test_copy_format: is_from=false +NOTICE: CopyToOutFunc: attribute: smallint +NOTICE: CopyToOutFunc: attribute: integer +NOTICE: CopyToOutFunc: attribute: bigint +NOTICE: CopyToStart: the number of attributes: 3 +NOTICE: CopyToOneRow: the number of valid values: 3 +NOTICE: CopyToOneRow: the number of valid values: 3 +NOTICE: CopyToOneRow: the number of valid values: 3 +NOTICE: CopyToEnd +-- We can't find a custom COPY handler without schema when search_path +-- doesn't include the schema where we installed custom COPY handlers. +COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format'); +ERROR: COPY format "test_copy_format" not recognized +LINE 1: COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format'); + ^ +COPY copy_data TO stdout WITH (FORMAT 'test_copy_format'); +ERROR: COPY format "test_copy_format" not recognized +LINE 1: COPY copy_data TO stdout WITH (FORMAT 'test_copy_format'); + ^ +-- We can find a custom COPY handler without schema when search_path +-- includes the schema where we installed custom COPY handlers. +SET search_path = test_schema,public; +COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format'); +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: attribute: smallint +NOTICE: CopyFromInFunc: attribute: integer +NOTICE: CopyFromInFunc: attribute: bigint +NOTICE: CopyFromStart: the number of attributes: 3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromEnd +COPY copy_data TO stdout WITH (FORMAT 'test_copy_format'); +NOTICE: test_copy_format: is_from=false +NOTICE: CopyToOutFunc: attribute: smallint +NOTICE: CopyToOutFunc: attribute: integer +NOTICE: CopyToOutFunc: attribute: bigint +NOTICE: CopyToStart: the number of attributes: 3 +NOTICE: CopyToOneRow: the number of valid values: 3 +NOTICE: CopyToOneRow: the number of valid values: 3 +NOTICE: CopyToOneRow: the number of valid values: 3 +NOTICE: CopyToEnd +RESET search_path; +-- Invalid cases with qualified name. +-- Input type is wrong +COPY copy_data FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_input_type'); +ERROR: COPY format "test_schema.test_copy_format_wrong_input_type" not recognized +LINE 1: COPY copy_data FROM stdin WITH (FORMAT 'test_schema.test_cop... + ^ +COPY copy_data TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_input_type'); +ERROR: COPY format "test_schema.test_copy_format_wrong_input_type" not recognized +LINE 1: COPY copy_data TO stdout WITH (FORMAT 'test_schema.test_copy... + ^ +-- Return type is wrong +COPY copy_data FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_return_type'); +ERROR: function test_schema.test_copy_format_wrong_return_type must return type copy_handler +LINE 1: COPY copy_data FROM stdin WITH (FORMAT 'test_schema.test_cop... + ^ +COPY copy_data TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_return_type'); +ERROR: function test_schema.test_copy_format_wrong_return_type must return type copy_handler +LINE 1: COPY copy_data TO stdout WITH (FORMAT 'test_schema.test_copy... + ^ +-- Returned value is wrong +COPY copy_data FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_return_value'); +ERROR: COPY handler function test_schema.test_copy_format_wrong_return_value did not return CopyFromRoutine struct +COPY copy_data TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_return_value'); +ERROR: COPY handler function test_schema.test_copy_format_wrong_return_value did not return CopyToRoutine struct +DROP TABLE copy_data; +DROP EXTENSION test_copy_format; +DROP SCHEMA test_schema; diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build new file mode 100644 index 00000000000..a45a2e0a039 --- /dev/null +++ b/src/test/modules/test_copy_format/meson.build @@ -0,0 +1,33 @@ +# Copyright (c) 2025, PostgreSQL Global Development Group + +test_copy_format_sources = files( + 'test_copy_format.c', +) + +if host_system == 'windows' + test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [ + '--NAME', 'test_copy_format', + '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',]) +endif + +test_copy_format = shared_module('test_copy_format', + test_copy_format_sources, + kwargs: pg_test_mod_args, +) +test_install_libs += test_copy_format + +test_install_data += files( + 'test_copy_format.control', + 'test_copy_format--1.0.sql', +) + +tests += { + 'name': 'test_copy_format', + 'sd': meson.current_source_dir(), + 'bd': meson.current_build_dir(), + 'regress': { + 'sql': [ + 'test_copy_format', + ], + }, +} diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql new file mode 100644 index 00000000000..b262794f878 --- /dev/null +++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql @@ -0,0 +1,52 @@ +CREATE TABLE copy_data (a smallint, b integer, c bigint); +INSERT INTO copy_data VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); + +-- No WITH SCHEMA. It installs custom COPY handlers to the current +-- schema. +CREATE EXTENSION test_copy_format; +-- We can find a custom COPY handler without schema. +COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format'); +\. +COPY copy_data TO stdout WITH (FORMAT 'test_copy_format'); +DROP EXTENSION test_copy_format; + + +-- Install custom COPY handlers to a schema that isn't included in +-- search_path. +CREATE SCHEMA test_schema; +CREATE EXTENSION test_copy_format WITH SCHEMA test_schema; + +-- We can find a custom COPY handler by qualified name. +COPY copy_data FROM stdin WITH (FORMAT 'test_schema.test_copy_format'); +\. +COPY copy_data TO stdout WITH (FORMAT 'test_schema.test_copy_format'); + +-- We can't find a custom COPY handler without schema when search_path +-- doesn't include the schema where we installed custom COPY handlers. +COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format'); +COPY copy_data TO stdout WITH (FORMAT 'test_copy_format'); + +-- We can find a custom COPY handler without schema when search_path +-- includes the schema where we installed custom COPY handlers. +SET search_path = test_schema,public; +COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format'); +\. +COPY copy_data TO stdout WITH (FORMAT 'test_copy_format'); +RESET search_path; + +-- Invalid cases with qualified name. + +-- Input type is wrong +COPY copy_data FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_input_type'); +COPY copy_data TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_input_type'); +-- Return type is wrong +COPY copy_data FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_return_type'); +COPY copy_data TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_return_type'); +-- Returned value is wrong +COPY copy_data FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_return_value'); +COPY copy_data TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_return_value'); + + +DROP TABLE copy_data; +DROP EXTENSION test_copy_format; +DROP SCHEMA test_schema; diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql b/src/test/modules/test_copy_format/test_copy_format--1.0.sql new file mode 100644 index 00000000000..c1a137181f8 --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql @@ -0,0 +1,24 @@ +/* src/test/modules/test_copy_format/test_copy_format--1.0.sql */ + +-- complain if script is sourced in psql, rather than via CREATE EXTENSION +\echo Use "CREATE EXTENSION test_copy_format" to load this file. \quit + +CREATE FUNCTION test_copy_format(internal) + RETURNS copy_handler + AS 'MODULE_PATHNAME', 'test_copy_format' + LANGUAGE C; + +CREATE FUNCTION test_copy_format_wrong_input_type(bool) + RETURNS copy_handler + AS 'MODULE_PATHNAME', 'test_copy_format' + LANGUAGE C; + +CREATE FUNCTION test_copy_format_wrong_return_type(internal) + RETURNS bool + AS 'MODULE_PATHNAME', 'test_copy_format' + LANGUAGE C; + +CREATE FUNCTION test_copy_format_wrong_return_value(internal) + RETURNS copy_handler + AS 'MODULE_PATHNAME', 'test_copy_format_wrong_return_value' + LANGUAGE C; diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c new file mode 100644 index 00000000000..1d754201336 --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format.c @@ -0,0 +1,113 @@ +/*-------------------------------------------------------------------------- + * + * test_copy_format.c + * Code for testing custom COPY format. + * + * Portions Copyright (c) 2025, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/test/modules/test_copy_format/test_copy_format.c + * + * ------------------------------------------------------------------------- + */ + +#include "postgres.h" + +#include "commands/copyapi.h" +#include "commands/defrem.h" +#include "utils/builtins.h" + +PG_MODULE_MAGIC; + +static void +TestCopyFromInFunc(CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam) +{ + ereport(NOTICE, (errmsg("CopyFromInFunc: attribute: %s", format_type_be(atttypid)))); +} + +static void +TestCopyFromStart(CopyFromState cstate, TupleDesc tupDesc) +{ + ereport(NOTICE, (errmsg("CopyFromStart: the number of attributes: %d", tupDesc->natts))); +} + +static bool +TestCopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls) +{ + ereport(NOTICE, (errmsg("CopyFromOneRow"))); + return false; +} + +static void +TestCopyFromEnd(CopyFromState cstate) +{ + ereport(NOTICE, (errmsg("CopyFromEnd"))); +} + +static const CopyFromRoutine CopyFromRoutineTestCopyFormat = { + .type = T_CopyFromRoutine, + .CopyFromInFunc = TestCopyFromInFunc, + .CopyFromStart = TestCopyFromStart, + .CopyFromOneRow = TestCopyFromOneRow, + .CopyFromEnd = TestCopyFromEnd, +}; + +static void +TestCopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) +{ + ereport(NOTICE, (errmsg("CopyToOutFunc: attribute: %s", format_type_be(atttypid)))); +} + +static void +TestCopyToStart(CopyToState cstate, TupleDesc tupDesc) +{ + ereport(NOTICE, (errmsg("CopyToStart: the number of attributes: %d", tupDesc->natts))); +} + +static void +TestCopyToOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + ereport(NOTICE, (errmsg("CopyToOneRow: the number of valid values: %u", slot->tts_nvalid))); +} + +static void +TestCopyToEnd(CopyToState cstate) +{ + ereport(NOTICE, (errmsg("CopyToEnd"))); +} + +static const CopyToRoutine CopyToRoutineTestCopyFormat = { + .type = T_CopyToRoutine, + .CopyToOutFunc = TestCopyToOutFunc, + .CopyToStart = TestCopyToStart, + .CopyToOneRow = TestCopyToOneRow, + .CopyToEnd = TestCopyToEnd, +}; + +PG_FUNCTION_INFO_V1(test_copy_format); +Datum +test_copy_format(PG_FUNCTION_ARGS) +{ + bool is_from = PG_GETARG_BOOL(0); + + ereport(NOTICE, + (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false"))); + + if (is_from) + PG_RETURN_POINTER(&CopyFromRoutineTestCopyFormat); + else + PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat); +} + +PG_FUNCTION_INFO_V1(test_copy_format_wrong_return_value); +Datum +test_copy_format_wrong_return_value(PG_FUNCTION_ARGS) +{ + bool is_from = PG_GETARG_BOOL(0); + + if (is_from) + PG_RETURN_CSTRING(pstrdup("is_from=true")); + else + PG_RETURN_CSTRING(pstrdup("is_from=false")); +} diff --git a/src/test/modules/test_copy_format/test_copy_format.control b/src/test/modules/test_copy_format/test_copy_format.control new file mode 100644 index 00000000000..f05a6362358 --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format.control @@ -0,0 +1,4 @@ +comment = 'Test code for custom COPY format' +default_version = '1.0' +module_pathname = '$libdir/test_copy_format' +relocatable = true -- 2.47.2
>From 18618368721678d78934251ff8243705013458f0 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <k...@clear-code.com> Date: Thu, 27 Mar 2025 11:24:15 +0900 Subject: [PATCH v40 3/6] Add support for implementing custom COPY handler as extension * TO: Add CopyToStateData::opaque that can be used to keep data for custom COPY TO handler implementation * TO: Export CopySendEndOfRow() to send end of row data as CopyToStateFlush() * FROM: Add CopyFromStateData::opaque that can be used to keep data for custom COPY FROM handler implementation * FROM: Export CopyGetData() to get the next data as CopyFromStateGetData() * FROM: Add CopyFromSkipErrorRow() for "ON_ERROR stop" and "LOG_VERBOSITY verbose" COPY FROM extensions must call CopyFromSkipErrorRow() when CopyFromOneRow callback reports an error by errsave(). CopyFromSkipErrorRow() handles "ON_ERROR stop" and "LOG_VERBOSITY verbose" cases. --- src/backend/commands/copyfromparse.c | 93 ++++++++++++------- src/backend/commands/copyto.c | 12 +++ src/include/commands/copyapi.h | 6 ++ src/include/commands/copyfrom_internal.h | 3 + src/include/commands/copyto_internal.h | 3 + .../expected/test_copy_format.out | 50 ++++++++++ .../test_copy_format/sql/test_copy_format.sql | 35 +++++++ .../test_copy_format/test_copy_format.c | 80 +++++++++++++++- 8 files changed, 245 insertions(+), 37 deletions(-) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 9f7171d1478..de68b53b000 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -739,6 +739,17 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes) return copied_bytes; } +/* + * Export CopyGetData() for extensions. We want to keep CopyGetData() as a + * static function for optimization. CopyGetData() calls in this file may be + * optimized by a compiler. + */ +int +CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread) +{ + return CopyGetData(cstate, dest, minread, maxread); +} + /* * This function is exposed for use by extensions that read raw fields in the * next line. See NextCopyFromRawFieldsInternal() for details. @@ -927,6 +938,51 @@ CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true); } +/* + * Call this when you report an error by errsave() in your CopyFromOneRow + * callback. This handles "ON_ERROR stop" and "LOG_VERBOSITY verbose" cases + * for you. + */ +void +CopyFromSkipErrorRow(CopyFromState cstate) +{ + Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); + + cstate->num_errors++; + + if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) + { + /* + * Since we emit line number and column info in the below notice + * message, we suppress error context information other than the + * relation name. + */ + Assert(!cstate->relname_only); + cstate->relname_only = true; + + if (cstate->cur_attval) + { + char *attval; + + attval = CopyLimitPrintoutLength(cstate->cur_attval); + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"", + cstate->cur_lineno, + cstate->cur_attname, + attval)); + pfree(attval); + } + else + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input", + cstate->cur_lineno, + cstate->cur_attname)); + + /* reset relname_only */ + cstate->relname_only = false; + } +} + /* * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow(). * @@ -1033,42 +1089,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, (Node *) cstate->escontext, &values[m])) { - Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); - - cstate->num_errors++; - - if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) - { - /* - * Since we emit line number and column info in the below - * notice message, we suppress error context information other - * than the relation name. - */ - Assert(!cstate->relname_only); - cstate->relname_only = true; - - if (cstate->cur_attval) - { - char *attval; - - attval = CopyLimitPrintoutLength(cstate->cur_attval); - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": \"%s\"", - cstate->cur_lineno, - cstate->cur_attname, - attval)); - pfree(attval); - } - else - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null input", - cstate->cur_lineno, - cstate->cur_attname)); - - /* reset relname_only */ - cstate->relname_only = false; - } - + CopyFromSkipErrorRow(cstate); return true; } diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 265b847e255..d6fcfdfb9b1 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -454,6 +454,18 @@ CopySendEndOfRow(CopyToState cstate) resetStringInfo(fe_msgbuf); } +/* + * Export CopySendEndOfRow() for extensions. We want to keep + * CopySendEndOfRow() as a static function for + * optimization. CopySendEndOfRow() calls in this file may be optimized by a + * compiler. + */ +void +CopyToStateFlush(CopyToState cstate) +{ + CopySendEndOfRow(cstate); +} + /* * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the * line termination and do common appropriate things for the end of row. diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 53ad3337f86..500ece7d5bb 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -56,6 +56,8 @@ typedef struct CopyToRoutine void (*CopyToEnd) (CopyToState cstate); } CopyToRoutine; +extern void CopyToStateFlush(CopyToState cstate); + /* * API structure for a COPY FROM format implementation. Note this must be * allocated in a server-lifetime manner, typically as a static const struct. @@ -106,4 +108,8 @@ typedef struct CopyFromRoutine void (*CopyFromEnd) (CopyFromState cstate); } CopyFromRoutine; +extern int CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread); + +extern void CopyFromSkipErrorRow(CopyFromState cstate); + #endif /* COPYAPI_H */ diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index 24157e11a73..f9e27152313 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -181,6 +181,9 @@ typedef struct CopyFromStateData #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index) uint64 bytes_processed; /* number of bytes processed so far */ + + /* For custom format implementation */ + void *opaque; /* private space */ } CopyFromStateData; extern void ReceiveCopyBegin(CopyFromState cstate); diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h index da796131988..3bd9d702bf0 100644 --- a/src/include/commands/copyto_internal.h +++ b/src/include/commands/copyto_internal.h @@ -78,6 +78,9 @@ typedef struct CopyToStateData FmgrInfo *out_functions; /* lookup info for output functions */ MemoryContext rowcontext; /* per-row evaluation context */ uint64 bytes_processed; /* number of bytes processed so far */ + + /* For custom format implementation */ + void *opaque; /* private space */ } CopyToStateData; #endif /* COPYTO_INTERNAL_H */ diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out index 3916b766615..47a875f0ab1 100644 --- a/src/test/modules/test_copy_format/expected/test_copy_format.out +++ b/src/test/modules/test_copy_format/expected/test_copy_format.out @@ -4,6 +4,8 @@ INSERT INTO copy_data VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); -- schema. CREATE EXTENSION test_copy_format; -- We can find a custom COPY handler without schema. +-- 987 is accepted. +-- 654 is a hard error because ON_ERROR is stop by default. COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format'); NOTICE: test_copy_format: is_from=true NOTICE: CopyFromInFunc: attribute: smallint @@ -11,7 +13,50 @@ NOTICE: CopyFromInFunc: attribute: integer NOTICE: CopyFromInFunc: attribute: bigint NOTICE: CopyFromStart: the number of attributes: 3 NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +ERROR: invalid value: "6" +CONTEXT: COPY copy_data, line 2, column a: "6" +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore); +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: attribute: smallint +NOTICE: CopyFromInFunc: attribute: integer +NOTICE: CopyFromInFunc: attribute: bigint +NOTICE: CopyFromStart: the number of attributes: 3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +NOTICE: 1 row was skipped due to data type incompatibility NOTICE: CopyFromEnd +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose); +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: attribute: smallint +NOTICE: CopyFromInFunc: attribute: integer +NOTICE: CopyFromInFunc: attribute: bigint +NOTICE: CopyFromStart: the number of attributes: 3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +NOTICE: skipping row due to data type incompatibility at line 2 for column "a": "6" +NOTICE: CopyFromOneRow +NOTICE: 1 row was skipped due to data type incompatibility +NOTICE: CopyFromEnd +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +-- 321 is a hard error. +COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore); +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: attribute: smallint +NOTICE: CopyFromInFunc: attribute: integer +NOTICE: CopyFromInFunc: attribute: bigint +NOTICE: CopyFromStart: the number of attributes: 3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +ERROR: too much lines: 3 +CONTEXT: COPY copy_data, line 3 COPY copy_data TO stdout WITH (FORMAT 'test_copy_format'); NOTICE: test_copy_format: is_from=false NOTICE: CopyToOutFunc: attribute: smallint @@ -21,7 +66,12 @@ NOTICE: CopyToStart: the number of attributes: 3 NOTICE: CopyToOneRow: the number of valid values: 3 NOTICE: CopyToOneRow: the number of valid values: 3 NOTICE: CopyToOneRow: the number of valid values: 3 +NOTICE: CopyToOneRow: the number of valid values: 3 +NOTICE: CopyToOneRow: the number of valid values: 3 NOTICE: CopyToEnd +-- Reset data. +TRUNCATE copy_data; +INSERT INTO copy_data VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); DROP EXTENSION test_copy_format; -- Install custom COPY handlers to a schema that isn't included in -- search_path. diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql index b262794f878..c7beb2fb8ae 100644 --- a/src/test/modules/test_copy_format/sql/test_copy_format.sql +++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql @@ -4,10 +4,45 @@ INSERT INTO copy_data VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); -- No WITH SCHEMA. It installs custom COPY handlers to the current -- schema. CREATE EXTENSION test_copy_format; + -- We can find a custom COPY handler without schema. + +-- 987 is accepted. +-- 654 is a hard error because ON_ERROR is stop by default. COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format'); +987 +654 \. + +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore); +987 +654 +\. + +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose); +987 +654 +\. + +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +-- 321 is a hard error. +COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore); +987 +654 +321 +\. + COPY copy_data TO stdout WITH (FORMAT 'test_copy_format'); + +-- Reset data. +TRUNCATE copy_data; +INSERT INTO copy_data VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); + DROP EXTENSION test_copy_format; diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c index 1d754201336..34ec693a7ec 100644 --- a/src/test/modules/test_copy_format/test_copy_format.c +++ b/src/test/modules/test_copy_format/test_copy_format.c @@ -14,6 +14,7 @@ #include "postgres.h" #include "commands/copyapi.h" +#include "commands/copyfrom_internal.h" #include "commands/defrem.h" #include "utils/builtins.h" @@ -35,8 +36,85 @@ TestCopyFromStart(CopyFromState cstate, TupleDesc tupDesc) static bool TestCopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls) { + int n_attributes = list_length(cstate->attnumlist); + char *line; + int line_size = n_attributes + 1; /* +1 is for new line */ + int read_bytes; + ereport(NOTICE, (errmsg("CopyFromOneRow"))); - return false; + + cstate->cur_lineno++; + line = palloc(line_size); + read_bytes = CopyFromStateGetData(cstate, line, line_size, line_size); + if (read_bytes == 0) + return false; + if (read_bytes != line_size) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("one line must be %d bytes: %d", + line_size, read_bytes))); + + if (cstate->cur_lineno == 1) + { + /* Success */ + TupleDesc tupDesc = RelationGetDescr(cstate->rel); + ListCell *cur; + int i = 0; + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + + if (att->atttypid == INT2OID) + { + values[i] = Int16GetDatum(line[i] - '0'); + } + else if (att->atttypid == INT4OID) + { + values[i] = Int32GetDatum(line[i] - '0'); + } + else if (att->atttypid == INT8OID) + { + values[i] = Int64GetDatum(line[i] - '0'); + } + nulls[i] = false; + i++; + } + } + else if (cstate->cur_lineno == 2) + { + /* Soft error */ + TupleDesc tupDesc = RelationGetDescr(cstate->rel); + int attnum = lfirst_int(list_head(cstate->attnumlist)); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + char value[2]; + + cstate->cur_attname = NameStr(att->attname); + value[0] = line[0]; + value[1] = '\0'; + cstate->cur_attval = value; + errsave((Node *) cstate->escontext, + ( + errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("invalid value: \"%c\"", line[0]))); + CopyFromSkipErrorRow(cstate); + cstate->cur_attname = NULL; + cstate->cur_attval = NULL; + return true; + } + else + { + /* Hard error */ + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("too much lines: %llu", + (unsigned long long) cstate->cur_lineno))); + } + + return true; } static void -- 2.47.2
>From ed454fd1998bca012182b977c227b4a0caa3ccd6 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <k...@clear-code.com> Date: Thu, 27 Mar 2025 11:56:45 +0900 Subject: [PATCH v40 4/6] Use copy handlers for built-in formats This adds copy handlers for text, csv and binary. We can simplify Copy{To,From}GetRoutine() by this. We'll be able to remove CopyFormatOptions::{binary,csv_mode} when we add more callbacks to Copy{To,From}Routine and move format specific routines to Copy{To,From}Routine::*. --- src/backend/commands/copy.c | 101 ++++++++++++------ src/backend/commands/copyfrom.c | 42 ++++---- src/backend/commands/copyto.c | 42 ++++---- src/include/catalog/pg_proc.dat | 11 ++ src/include/commands/copy.h | 2 +- src/include/commands/copyfrom_internal.h | 6 +- src/include/commands/copyto_internal.h | 6 +- .../expected/test_copy_format.out | 35 ++++++ .../test_copy_format/sql/test_copy_format.sql | 32 ++++++ .../test_copy_format--1.0.sql | 15 +++ 10 files changed, 211 insertions(+), 81 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 9515c4d5786..38ed8bccacd 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -22,7 +22,9 @@ #include "access/table.h" #include "access/xact.h" #include "catalog/pg_authid.h" -#include "commands/copy.h" +#include "commands/copyapi.h" +#include "commands/copyto_internal.h" +#include "commands/copyfrom_internal.h" #include "commands/defrem.h" #include "executor/executor.h" #include "mb/pg_wchar.h" @@ -521,43 +523,45 @@ ProcessCopyOptions(ParseState *pstate, if (strcmp(defel->defname, "format") == 0) { - char *fmt = defGetString(defel); + char *format = defGetString(defel); + List *qualified_format; + char *schema; + char *fmt; + Oid arg_types[1]; + Oid handler = InvalidOid; if (format_specified) errorConflictingDefElem(defel, pstate); format_specified = true; - if (strcmp(fmt, "text") == 0) - /* default format */ ; - else if (strcmp(fmt, "csv") == 0) - opts_out->csv_mode = true; - else if (strcmp(fmt, "binary") == 0) - opts_out->binary = true; - else + + qualified_format = stringToQualifiedNameList(format, NULL); + DeconstructQualifiedName(qualified_format, &schema, &fmt); + if (!schema || strcmp(schema, "pg_catalog") == 0) { - List *qualified_format; - Oid arg_types[1]; - Oid handler = InvalidOid; - - qualified_format = stringToQualifiedNameList(fmt, NULL); - arg_types[0] = INTERNALOID; - handler = LookupFuncName(qualified_format, 1, - arg_types, true); - if (!OidIsValid(handler)) - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("COPY format \"%s\" not recognized", fmt), - parser_errposition(pstate, defel->location))); - - /* check that handler has correct return type */ - if (get_func_rettype(handler) != COPY_HANDLEROID) - ereport(ERROR, - (errcode(ERRCODE_WRONG_OBJECT_TYPE), - errmsg("function %s must return type %s", - fmt, "copy_handler"), - parser_errposition(pstate, defel->location))); - - opts_out->handler = handler; + if (strcmp(fmt, "csv") == 0) + opts_out->csv_mode = true; + else if (strcmp(fmt, "binary") == 0) + opts_out->binary = true; } + + arg_types[0] = INTERNALOID; + handler = LookupFuncName(qualified_format, 1, + arg_types, true); + if (!OidIsValid(handler)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY format \"%s\" not recognized", format), + parser_errposition(pstate, defel->location))); + + /* check that handler has correct return type */ + if (get_func_rettype(handler) != COPY_HANDLEROID) + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("function %s must return type %s", + format, "copy_handler"), + parser_errposition(pstate, defel->location))); + + opts_out->handler = handler; } else if (strcmp(defel->defname, "freeze") == 0) { @@ -1040,3 +1044,36 @@ CopyGetAttnums(TupleDesc tupDesc, Relation rel, List *attnamelist) return attnums; } + +Datum +copy_text_handler(PG_FUNCTION_ARGS) +{ + bool is_from = PG_GETARG_BOOL(0); + + if (is_from) + PG_RETURN_POINTER(&CopyFromRoutineText); + else + PG_RETURN_POINTER(&CopyToRoutineText); +} + +Datum +copy_csv_handler(PG_FUNCTION_ARGS) +{ + bool is_from = PG_GETARG_BOOL(0); + + if (is_from) + PG_RETURN_POINTER(&CopyFromRoutineCSV); + else + PG_RETURN_POINTER(&CopyToRoutineCSV); +} + +Datum +copy_binary_handler(PG_FUNCTION_ARGS) +{ + bool is_from = PG_GETARG_BOOL(0); + + if (is_from) + PG_RETURN_POINTER(&CopyFromRoutineBinary); + else + PG_RETURN_POINTER(&CopyToRoutineBinary); +} diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 3d86e8a8328..74a8051c24c 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -45,6 +45,7 @@ #include "rewrite/rewriteHandler.h" #include "storage/fd.h" #include "tcop/tcopprot.h" +#include "utils/fmgroids.h" #include "utils/lsyscache.h" #include "utils/memutils.h" #include "utils/portal.h" @@ -128,7 +129,7 @@ static void CopyFromBinaryEnd(CopyFromState cstate); */ /* text format */ -static const CopyFromRoutine CopyFromRoutineText = { +const CopyFromRoutine CopyFromRoutineText = { .type = T_CopyFromRoutine, .CopyFromInFunc = CopyFromTextLikeInFunc, .CopyFromStart = CopyFromTextLikeStart, @@ -137,7 +138,7 @@ static const CopyFromRoutine CopyFromRoutineText = { }; /* CSV format */ -static const CopyFromRoutine CopyFromRoutineCSV = { +const CopyFromRoutine CopyFromRoutineCSV = { .type = T_CopyFromRoutine, .CopyFromInFunc = CopyFromTextLikeInFunc, .CopyFromStart = CopyFromTextLikeStart, @@ -146,7 +147,7 @@ static const CopyFromRoutine CopyFromRoutineCSV = { }; /* binary format */ -static const CopyFromRoutine CopyFromRoutineBinary = { +const CopyFromRoutine CopyFromRoutineBinary = { .type = T_CopyFromRoutine, .CopyFromInFunc = CopyFromBinaryInFunc, .CopyFromStart = CopyFromBinaryStart, @@ -158,28 +159,23 @@ static const CopyFromRoutine CopyFromRoutineBinary = { static const CopyFromRoutine * CopyFromGetRoutine(const CopyFormatOptions *opts) { - if (OidIsValid(opts->handler)) - { - Datum datum; - Node *routine; - - datum = OidFunctionCall1(opts->handler, BoolGetDatum(true)); - routine = (Node *) DatumGetPointer(datum); - if (routine == NULL || !IsA(routine, CopyFromRoutine)) - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("COPY handler function %s.%s did not return CopyFromRoutine struct", - get_namespace_name(get_func_namespace(opts->handler)), - get_func_name(opts->handler)))); - return castNode(CopyFromRoutine, routine); - } - else if (opts->csv_mode) - return &CopyFromRoutineCSV; - else if (opts->binary) - return &CopyFromRoutineBinary; + Oid handler = opts->handler; + Datum datum; + Node *routine; /* default is text */ - return &CopyFromRoutineText; + if (!OidIsValid(handler)) + handler = F_TEXT_INTERNAL; + + datum = OidFunctionCall1(handler, BoolGetDatum(true)); + routine = (Node *) DatumGetPointer(datum); + if (routine == NULL || !IsA(routine, CopyFromRoutine)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function %s.%s did not return CopyFromRoutine struct", + get_namespace_name(get_func_namespace(handler)), + get_func_name(handler)))); + return castNode(CopyFromRoutine, routine); } /* Implementation of the start callback for text and CSV formats */ diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index d6fcfdfb9b1..4e1b154cad2 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -30,6 +30,7 @@ #include "pgstat.h" #include "storage/fd.h" #include "tcop/tcopprot.h" +#include "utils/fmgroids.h" #include "utils/lsyscache.h" #include "utils/memutils.h" #include "utils/rel.h" @@ -87,7 +88,7 @@ static void CopySendInt16(CopyToState cstate, int16 val); */ /* text format */ -static const CopyToRoutine CopyToRoutineText = { +const CopyToRoutine CopyToRoutineText = { .type = T_CopyToRoutine, .CopyToStart = CopyToTextLikeStart, .CopyToOutFunc = CopyToTextLikeOutFunc, @@ -96,7 +97,7 @@ static const CopyToRoutine CopyToRoutineText = { }; /* CSV format */ -static const CopyToRoutine CopyToRoutineCSV = { +const CopyToRoutine CopyToRoutineCSV = { .type = T_CopyToRoutine, .CopyToStart = CopyToTextLikeStart, .CopyToOutFunc = CopyToTextLikeOutFunc, @@ -105,7 +106,7 @@ static const CopyToRoutine CopyToRoutineCSV = { }; /* binary format */ -static const CopyToRoutine CopyToRoutineBinary = { +const CopyToRoutine CopyToRoutineBinary = { .type = T_CopyToRoutine, .CopyToStart = CopyToBinaryStart, .CopyToOutFunc = CopyToBinaryOutFunc, @@ -117,28 +118,23 @@ static const CopyToRoutine CopyToRoutineBinary = { static const CopyToRoutine * CopyToGetRoutine(const CopyFormatOptions *opts) { - if (OidIsValid(opts->handler)) - { - Datum datum; - Node *routine; - - datum = OidFunctionCall1(opts->handler, BoolGetDatum(false)); - routine = (Node *) DatumGetPointer(datum); - if (routine == NULL || !IsA(routine, CopyToRoutine)) - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("COPY handler function %s.%s did not return CopyToRoutine struct", - get_namespace_name(get_func_namespace(opts->handler)), - get_func_name(opts->handler)))); - return castNode(CopyToRoutine, routine); - } - else if (opts->csv_mode) - return &CopyToRoutineCSV; - else if (opts->binary) - return &CopyToRoutineBinary; + Oid handler = opts->handler; + Datum datum; + Node *routine; /* default is text */ - return &CopyToRoutineText; + if (!OidIsValid(handler)) + handler = F_TEXT_INTERNAL; + + datum = OidFunctionCall1(handler, BoolGetDatum(false)); + routine = (Node *) DatumGetPointer(datum); + if (routine == NULL || !IsA(routine, CopyToRoutine)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function %s.%s did not return CopyToRoutine struct", + get_namespace_name(get_func_namespace(handler)), + get_func_name(handler)))); + return castNode(CopyToRoutine, routine); } /* Implementation of the start callback for text and CSV formats */ diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index ba46bfa48a8..e038157eb74 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -12572,4 +12572,15 @@ proargnames => '{pid,io_id,io_generation,state,operation,off,length,target,handle_data_len,raw_result,result,target_desc,f_sync,f_localmem,f_buffered}', prosrc => 'pg_get_aios' }, +# COPY handlers +{ oid => '8100', descr => 'text COPY FORMAT handler', + proname => 'text', provolatile => 'i', prorettype => 'copy_handler', + proargtypes => 'internal', prosrc => 'copy_text_handler' }, +{ oid => '8101', descr => 'csv COPY FORMAT handler', + proname => 'csv', provolatile => 'i', prorettype => 'copy_handler', + proargtypes => 'internal', prosrc => 'copy_csv_handler' }, +{ oid => '8102', descr => 'binary COPY FORMAT handler', + proname => 'binary', provolatile => 'i', prorettype => 'copy_handler', + proargtypes => 'internal', prosrc => 'copy_binary_handler' }, + ] diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 6df1f8a3b9b..4525261fcc4 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -87,7 +87,7 @@ typedef struct CopyFormatOptions CopyLogVerbosityChoice log_verbosity; /* verbosity of logged messages */ int64 reject_limit; /* maximum tolerable number of errors */ List *convert_select; /* list of column names (can be NIL) */ - Oid handler; /* handler function for custom format routine */ + Oid handler; /* handler function */ } CopyFormatOptions; /* These are private in commands/copy[from|to]_internal.h */ diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index f9e27152313..51d181c3ab4 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -14,7 +14,7 @@ #ifndef COPYFROM_INTERNAL_H #define COPYFROM_INTERNAL_H -#include "commands/copy.h" +#include "commands/copyapi.h" #include "commands/trigger.h" #include "nodes/miscnodes.h" @@ -197,4 +197,8 @@ extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls); +extern PGDLLIMPORT const CopyFromRoutine CopyFromRoutineText; +extern PGDLLIMPORT const CopyFromRoutine CopyFromRoutineCSV; +extern PGDLLIMPORT const CopyFromRoutine CopyFromRoutineBinary; + #endif /* COPYFROM_INTERNAL_H */ diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h index 3bd9d702bf0..9faf97c718a 100644 --- a/src/include/commands/copyto_internal.h +++ b/src/include/commands/copyto_internal.h @@ -14,7 +14,7 @@ #ifndef COPYTO_INTERNAL_H #define COPYTO_INTERNAL_H -#include "commands/copy.h" +#include "commands/copyapi.h" #include "executor/execdesc.h" #include "executor/tuptable.h" #include "nodes/execnodes.h" @@ -83,4 +83,8 @@ typedef struct CopyToStateData void *opaque; /* private space */ } CopyToStateData; +extern PGDLLIMPORT const CopyToRoutine CopyToRoutineText; +extern PGDLLIMPORT const CopyToRoutine CopyToRoutineCSV; +extern PGDLLIMPORT const CopyToRoutine CopyToRoutineBinary; + #endif /* COPYTO_INTERNAL_H */ diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out index 47a875f0ab1..aa51e480b1d 100644 --- a/src/test/modules/test_copy_format/expected/test_copy_format.out +++ b/src/test/modules/test_copy_format/expected/test_copy_format.out @@ -72,6 +72,41 @@ NOTICE: CopyToEnd -- Reset data. TRUNCATE copy_data; INSERT INTO copy_data VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +-- test_copy_format extension installs text, csv and binary custom +-- COPY handlers to the public schema but they must not be +-- used. Builtin COPY handlers must be used. +-- public.text must not be used +COPY copy_data FROM stdin WITH (FORMAT text); +COPY copy_data TO stdout WITH (FORMAT text); +1 2 3 +12 34 56 +123 456 789 +COPY copy_data FROM stdin WITH (FORMAT 'pg_catalog.text'); +COPY copy_data TO stdout WITH (FORMAT 'pg_catalog.text'); +1 2 3 +12 34 56 +123 456 789 +-- public.csv must not be used +COPY copy_data FROM stdin WITH (FORMAT csv); +COPY copy_data TO stdout WITH (FORMAT csv); +1,2,3 +12,34,56 +123,456,789 +COPY copy_data FROM stdin WITH (FORMAT 'pg_catalog.csv'); +COPY copy_data TO stdout WITH (FORMAT 'pg_catalog.csv'); +1,2,3 +12,34,56 +123,456,789 +-- public.binary must not be used +\getenv abs_builddir PG_ABS_BUILDDIR +\set filename :abs_builddir '/results/binary.data' +COPY copy_data TO :'filename' WITH (FORMAT binary); +COPY copy_data FROM :'filename' WITH (FORMAT binary); +COPY copy_data TO :'filename' WITH (FORMAT 'pg_catalog.binary'); +COPY copy_data FROM :'filename' WITH (FORMAT 'pg_catalog.binary'); +-- Reset data. +TRUNCATE copy_data; +INSERT INTO copy_data VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); DROP EXTENSION test_copy_format; -- Install custom COPY handlers to a schema that isn't included in -- search_path. diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql index c7beb2fb8ae..3b7f6e72e13 100644 --- a/src/test/modules/test_copy_format/sql/test_copy_format.sql +++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql @@ -43,6 +43,38 @@ COPY copy_data TO stdout WITH (FORMAT 'test_copy_format'); TRUNCATE copy_data; INSERT INTO copy_data VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +-- test_copy_format extension installs text, csv and binary custom +-- COPY handlers to the public schema but they must not be +-- used. Builtin COPY handlers must be used. + +-- public.text must not be used +COPY copy_data FROM stdin WITH (FORMAT text); +\. +COPY copy_data TO stdout WITH (FORMAT text); +COPY copy_data FROM stdin WITH (FORMAT 'pg_catalog.text'); +\. +COPY copy_data TO stdout WITH (FORMAT 'pg_catalog.text'); + +-- public.csv must not be used +COPY copy_data FROM stdin WITH (FORMAT csv); +\. +COPY copy_data TO stdout WITH (FORMAT csv); +COPY copy_data FROM stdin WITH (FORMAT 'pg_catalog.csv'); +\. +COPY copy_data TO stdout WITH (FORMAT 'pg_catalog.csv'); + +-- public.binary must not be used +\getenv abs_builddir PG_ABS_BUILDDIR +\set filename :abs_builddir '/results/binary.data' +COPY copy_data TO :'filename' WITH (FORMAT binary); +COPY copy_data FROM :'filename' WITH (FORMAT binary); +COPY copy_data TO :'filename' WITH (FORMAT 'pg_catalog.binary'); +COPY copy_data FROM :'filename' WITH (FORMAT 'pg_catalog.binary'); + +-- Reset data. +TRUNCATE copy_data; +INSERT INTO copy_data VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); + DROP EXTENSION test_copy_format; diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql b/src/test/modules/test_copy_format/test_copy_format--1.0.sql index c1a137181f8..bfa1900e828 100644 --- a/src/test/modules/test_copy_format/test_copy_format--1.0.sql +++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql @@ -22,3 +22,18 @@ CREATE FUNCTION test_copy_format_wrong_return_value(internal) RETURNS copy_handler AS 'MODULE_PATHNAME', 'test_copy_format_wrong_return_value' LANGUAGE C; + +CREATE FUNCTION text(internal) + RETURNS copy_handler + AS 'MODULE_PATHNAME', 'test_copy_format' + LANGUAGE C; + +CREATE FUNCTION csv(internal) + RETURNS copy_handler + AS 'MODULE_PATHNAME', 'test_copy_format' + LANGUAGE C; + +CREATE FUNCTION binary(internal) + RETURNS copy_handler + AS 'MODULE_PATHNAME', 'test_copy_format' + LANGUAGE C; -- 2.47.2
>From 6e014bf226713a2c9f37da4c4f337128c4392212 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <k...@clear-code.com> Date: Fri, 25 Apr 2025 18:49:41 +0900 Subject: [PATCH v40 5/6] Remove CopyFormatOptions::{binary,csv_mode} Because we can compute them from CopyFormatOptions::handler. --- src/backend/commands/copy.c | 61 +++++++++++++--------------- src/backend/commands/copyfrom.c | 2 +- src/backend/commands/copyfromparse.c | 7 ++-- src/backend/commands/copyto.c | 4 +- src/include/commands/copy.h | 2 - 5 files changed, 36 insertions(+), 40 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 38ed8bccacd..21db5e964cf 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -38,6 +38,7 @@ #include "parser/parse_relation.h" #include "utils/acl.h" #include "utils/builtins.h" +#include "utils/fmgroids.h" #include "utils/lsyscache.h" #include "utils/regproc.h" #include "utils/rel.h" @@ -508,6 +509,8 @@ ProcessCopyOptions(ParseState *pstate, bool on_error_specified = false; bool log_verbosity_specified = false; bool reject_limit_specified = false; + bool binary = false; + bool csv_mode = false; ListCell *option; /* Support external use for option sanity checking */ @@ -525,8 +528,6 @@ ProcessCopyOptions(ParseState *pstate, { char *format = defGetString(defel); List *qualified_format; - char *schema; - char *fmt; Oid arg_types[1]; Oid handler = InvalidOid; @@ -535,15 +536,6 @@ ProcessCopyOptions(ParseState *pstate, format_specified = true; qualified_format = stringToQualifiedNameList(format, NULL); - DeconstructQualifiedName(qualified_format, &schema, &fmt); - if (!schema || strcmp(schema, "pg_catalog") == 0) - { - if (strcmp(fmt, "csv") == 0) - opts_out->csv_mode = true; - else if (strcmp(fmt, "binary") == 0) - opts_out->binary = true; - } - arg_types[0] = INTERNALOID; handler = LookupFuncName(qualified_format, 1, arg_types, true); @@ -562,6 +554,11 @@ ProcessCopyOptions(ParseState *pstate, parser_errposition(pstate, defel->location))); opts_out->handler = handler; + if (opts_out->handler == F_CSV) + csv_mode = true; + else if (opts_out->handler == F_BINARY) + binary = true; + } else if (strcmp(defel->defname, "freeze") == 0) { @@ -716,31 +713,31 @@ ProcessCopyOptions(ParseState *pstate, * Check for incompatible options (must do these three before inserting * defaults) */ - if (opts_out->binary && opts_out->delim) + if (binary && opts_out->delim) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */ errmsg("cannot specify %s in BINARY mode", "DELIMITER"))); - if (opts_out->binary && opts_out->null_print) + if (binary && opts_out->null_print) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("cannot specify %s in BINARY mode", "NULL"))); - if (opts_out->binary && opts_out->default_print) + if (binary && opts_out->default_print) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("cannot specify %s in BINARY mode", "DEFAULT"))); /* Set defaults for omitted options */ if (!opts_out->delim) - opts_out->delim = opts_out->csv_mode ? "," : "\t"; + opts_out->delim = csv_mode ? "," : "\t"; if (!opts_out->null_print) - opts_out->null_print = opts_out->csv_mode ? "" : "\\N"; + opts_out->null_print = csv_mode ? "" : "\\N"; opts_out->null_print_len = strlen(opts_out->null_print); - if (opts_out->csv_mode) + if (csv_mode) { if (!opts_out->quote) opts_out->quote = "\""; @@ -788,7 +785,7 @@ ProcessCopyOptions(ParseState *pstate, * future-proofing. Likewise we disallow all digits though only octal * digits are actually dangerous. */ - if (!opts_out->csv_mode && + if (!csv_mode && strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789", opts_out->delim[0]) != NULL) ereport(ERROR, @@ -796,43 +793,43 @@ ProcessCopyOptions(ParseState *pstate, errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim))); /* Check header */ - if (opts_out->binary && opts_out->header_line) + if (binary && opts_out->header_line) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */ errmsg("cannot specify %s in BINARY mode", "HEADER"))); /* Check quote */ - if (!opts_out->csv_mode && opts_out->quote != NULL) + if (!csv_mode && opts_out->quote != NULL) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */ errmsg("COPY %s requires CSV mode", "QUOTE"))); - if (opts_out->csv_mode && strlen(opts_out->quote) != 1) + if (csv_mode && strlen(opts_out->quote) != 1) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("COPY quote must be a single one-byte character"))); - if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0]) + if (csv_mode && opts_out->delim[0] == opts_out->quote[0]) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("COPY delimiter and quote must be different"))); /* Check escape */ - if (!opts_out->csv_mode && opts_out->escape != NULL) + if (!csv_mode && opts_out->escape != NULL) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */ errmsg("COPY %s requires CSV mode", "ESCAPE"))); - if (opts_out->csv_mode && strlen(opts_out->escape) != 1) + if (csv_mode && strlen(opts_out->escape) != 1) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("COPY escape must be a single one-byte character"))); /* Check force_quote */ - if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all)) + if (!csv_mode && (opts_out->force_quote || opts_out->force_quote_all)) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */ @@ -846,8 +843,8 @@ ProcessCopyOptions(ParseState *pstate, "COPY FROM"))); /* Check force_notnull */ - if (!opts_out->csv_mode && (opts_out->force_notnull != NIL || - opts_out->force_notnull_all)) + if (!csv_mode && (opts_out->force_notnull != NIL || + opts_out->force_notnull_all)) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */ @@ -862,8 +859,8 @@ ProcessCopyOptions(ParseState *pstate, "COPY TO"))); /* Check force_null */ - if (!opts_out->csv_mode && (opts_out->force_null != NIL || - opts_out->force_null_all)) + if (!csv_mode && (opts_out->force_null != NIL || + opts_out->force_null_all)) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */ @@ -887,7 +884,7 @@ ProcessCopyOptions(ParseState *pstate, "NULL"))); /* Don't allow the CSV quote char to appear in the null string. */ - if (opts_out->csv_mode && + if (csv_mode && strchr(opts_out->null_print, opts_out->quote[0]) != NULL) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), @@ -923,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate, "DEFAULT"))); /* Don't allow the CSV quote char to appear in the default string. */ - if (opts_out->csv_mode && + if (csv_mode && strchr(opts_out->default_print, opts_out->quote[0]) != NULL) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), @@ -940,7 +937,7 @@ ProcessCopyOptions(ParseState *pstate, errmsg("NULL specification and DEFAULT specification cannot be the same"))); } /* Check on_error */ - if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP) + if (binary && opts_out->on_error != COPY_ON_ERROR_STOP) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("only ON_ERROR STOP is allowed in BINARY mode"))); diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 74a8051c24c..b09b6b3e101 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -275,7 +275,7 @@ CopyFromErrorCallback(void *arg) cstate->cur_relname); return; } - if (cstate->opts.binary) + if (cstate->opts.handler == F_BINARY) { /* can't usefully display the data */ if (cstate->cur_attname) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index de68b53b000..148fa1f2062 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -73,6 +73,7 @@ #include "pgstat.h" #include "port/pg_bswap.h" #include "utils/builtins.h" +#include "utils/fmgroids.h" #include "utils/rel.h" #define ISOCTAL(c) (((c) >= '0') && ((c) <= '7')) @@ -171,7 +172,7 @@ ReceiveCopyBegin(CopyFromState cstate) { StringInfoData buf; int natts = list_length(cstate->attnumlist); - int16 format = (cstate->opts.binary ? 1 : 0); + int16 format = (cstate->opts.handler == F_BINARY ? 1 : 0); int i; pq_beginmessage(&buf, PqMsg_CopyInResponse); @@ -758,7 +759,7 @@ bool NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) { return NextCopyFromRawFieldsInternal(cstate, fields, nfields, - cstate->opts.csv_mode); + cstate->opts.handler == F_CSV); } /* @@ -785,7 +786,7 @@ NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields bool done; /* only available for text or csv input */ - Assert(!cstate->opts.binary); + Assert(cstate->opts.handler != F_BINARY); /* on input check that the header line is correct if needed */ if (cstate->cur_lineno == 0 && cstate->opts.header_line) diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 4e1b154cad2..4f8f5813172 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -167,7 +167,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc) colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname); - if (cstate->opts.csv_mode) + if (cstate->opts.handler == F_CSV) CopyAttributeOutCSV(cstate, colname, false); else CopyAttributeOutText(cstate, colname); @@ -344,7 +344,7 @@ SendCopyBegin(CopyToState cstate) { StringInfoData buf; int natts = list_length(cstate->attnumlist); - int16 format = (cstate->opts.binary ? 1 : 0); + int16 format = (cstate->opts.handler == F_BINARY ? 1 : 0); int i; pq_beginmessage(&buf, PqMsg_CopyOutResponse); diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 4525261fcc4..04f8f5ef1b2 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -61,9 +61,7 @@ typedef struct CopyFormatOptions /* parameters from the COPY command */ int file_encoding; /* file or remote side's character encoding, * -1 if not specified */ - bool binary; /* binary format? */ bool freeze; /* freeze rows on loading? */ - bool csv_mode; /* Comma Separated Value format? */ CopyHeaderChoice header_line; /* header line? */ char *null_print; /* NULL marker string (server encoding!) */ int null_print_len; /* length of same */ -- 2.47.2
>From 421c34b76a5e9fe45b49bdbe52ecda4d0f638617 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <k...@clear-code.com> Date: Wed, 19 Mar 2025 11:46:34 +0900 Subject: [PATCH v40 6/6] Add document how to write a COPY handler This is WIP because we haven't decided our API yet. Co-authored-by: David G. Johnston <david.g.johns...@gmail.com> --- doc/src/sgml/copy-handler.sgml | 394 +++++++++++++++++++++++++++++++++ doc/src/sgml/filelist.sgml | 1 + doc/src/sgml/postgres.sgml | 1 + src/include/commands/copyapi.h | 9 +- 4 files changed, 401 insertions(+), 4 deletions(-) create mode 100644 doc/src/sgml/copy-handler.sgml diff --git a/doc/src/sgml/copy-handler.sgml b/doc/src/sgml/copy-handler.sgml new file mode 100644 index 00000000000..5bc87d16662 --- /dev/null +++ b/doc/src/sgml/copy-handler.sgml @@ -0,0 +1,394 @@ +<!-- doc/src/sgml/copy-handler.sgml --> + +<chapter id="copy-handler"> + <title>Writing a Copy Handler</title> + + <indexterm zone="copy-handler"> + <primary><literal>COPY</literal> handler</primary> + </indexterm> + + <para> + <productname>PostgreSQL</productname> supports + custom <link linkend="sql-copy"><literal>COPY</literal></link> handlers; + adding additional <replaceable>format_name</replaceable> options to + the <literal>FORMAT</literal> clause. + </para> + + <para> + At the SQL level, a copy handler method is represented by a single SQL + function (see <xref linkend="sql-createfunction"/>), typically implemented in + C, having the signature +<synopsis> +<replaceable>format_name</replaceable>(internal) RETURNS <literal>copy_handler</literal> +</synopsis> + The function's name is then accepted as a + valid <replaceable>format_name</replaceable>. The return + pseudo-type <literal>copy_handler</literal> informs the system that this + function needs to be registered as a copy handler. + The <type>internal</type> argument is a dummy that prevents this function + from being called directly from an SQL command. As the handler + implementation must be server-lifetime immutable; this SQL function's + volatility should be marked immutable. The <literal>link_symbol</literal> + for this function is the name of the implementation function, described + next. + </para> + + <para> + The implementation function signature expected for the function named + in the <literal>link_symbol</literal> is: +<synopsis> +Datum +<replaceable>copy_format_handler</replaceable>(PG_FUNCTION_ARGS) +</synopsis> + The convention for the name is to replace the word + <replaceable>format</replaceable> in the placeholder above with the value given + to <replaceable>format_name</replaceable> in the SQL function. + The first argument is a <type>boolean</type> that indicates whether the handler + must provide a pointer to its implementation for <literal>COPY FROM</literal> + (a <type>CopyFromRoutine *</type>). If <literal>false</literal>, the handler + must provide a pointer to its implementation of <literal>COPY TO</literal> + (a <type>CopyToRoutine *</type>). These structs are declared in + <filename>src/include/commands/copyapi.h</filename>. + </para> + + <para> + The structs hold pointers to implementation functions for initializing, + starting, processing rows, and ending a copy operation. The specific + structures vary a bit between <literal>COPY FROM</literal> and + <literal>COPY TO</literal> so the next two sections describes each + in detail. + </para> + + <sect1 id="copy-handler-from"> + <title>Copy From Handler</title> + + <para> + The opening to this chapter describes how the executor will call the main + handler function with, in this case, + a <type>boolean</type> <literal>true</literal>, and expect to receive a + <type>CopyFromRoutine *</type> <type>Datum</type>. This section describes + the components of the <type>CopyFromRoutine</type> struct. + </para> + + <para> +<programlisting> +void +CopyFromInFunc(CopyFromState cstate, + Oid atttypid, + FmgrInfo *finfo, + Oid *typioparam); +</programlisting> + + This sets input function information for the + given <literal>atttypid</literal> attribute. This function is called once + at the beginning of <literal>COPY FROM</literal>. If + this <literal>COPY</literal> handler doesn't use any input functions, this + function doesn't need to do anything. + + <variablelist> + <varlistentry> + <term><literal>CopyFromState *cstate</literal></term> + <listitem> + <para> + This is an internal struct that contains all the state variables used + throughout a <literal>COPY FROM</literal> operation. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>Oid atttypid</literal></term> + <listitem> + <para> + This is the OID of data type used by the relation's attribute. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>FmgrInfo *finfo</literal></term> + <listitem> + <para> + This can be optionally filled to provide the catalog information of + the input function. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>Oid *typioparam</literal></term> + <listitem> + <para> + This can be optionally filled to define the OID of the type to + pass to the input function. + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + + <para> +<programlisting> +void +CopyFromStart(CopyFromState cstate, + TupleDesc tupDesc); +</programlisting> + + This starts a <literal>COPY FROM</literal>. This function is called once at + the beginning of <literal>COPY FROM</literal>. + + <variablelist> + <varlistentry> + <term><literal>CopyFromState *cstate</literal></term> + <listitem> + <para> + This is an internal struct that contains all the state variables used + throughout a <literal>COPY FROM</literal> operation. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>TupleDesc tupDesc</literal></term> + <listitem> + <para> + This is the tuple descriptor of the relation where the data needs to be + copied. This can be used for any initialization steps required by a + format. + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + + <para> +<programlisting> +bool +CopyFromOneRow(CopyFromState cstate, + ExprContext *econtext, + Datum *values, + bool *nulls); +</programlisting> + + This reads one row from the source and fill <literal>values</literal> + and <literal>nulls</literal>. If there is one or more tuples to be read, + this must return <literal>true</literal>. If there are no more tuples to + read, this must return <literal>false</literal>. + + <variablelist> + <varlistentry> + <term><literal>CopyFromState *cstate</literal></term> + <listitem> + <para> + This is an internal struct that contains all the state variables used + throughout a <literal>COPY FROM</literal> operation. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>ExprContext *econtext</literal></term> + <listitem> + <para> + This is used to evaluate default expression for each column that is + either not read from the file or is using + the <literal>DEFAULT</literal> option of <literal>COPY + FROM</literal>. It is <literal>NULL</literal> if no default values are + used. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>Datum *values</literal></term> + <listitem> + <para> + This is an output variable to store read tuples. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>bool *nulls</literal></term> + <listitem> + <para> + This is an output variable to store whether the read columns + are <literal>NULL</literal> or not. + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + + <para> +<programlisting> +void +CopyFromEnd(CopyFromState cstate); +</programlisting> + + This ends a <literal>COPY FROM</literal>. This function is called once at + the end of <literal>COPY FROM</literal>. + + <variablelist> + <varlistentry> + <term><literal>CopyFromState *cstate</literal></term> + <listitem> + <para> + This is an internal struct that contains all the state variables used + throughout a <literal>COPY FROM</literal> operation. + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + + <para> + TODO: Add CopyFromStateGetData() and CopyFromSkipErrowRow()? + </para> + </sect1> + + <sect1 id="copy-handler-to"> + <title>Copy To Handler</title> + + <para> + The <literal>COPY</literal> handler function for <literal>COPY + TO</literal> returns a <type>CopyToRoutine</type> struct containing + pointers to the functions described below. All functions are required. + </para> + + <para> +<programlisting> +void +CopyToOutFunc(CopyToState cstate, + Oid atttypid, + FmgrInfo *finfo); +</programlisting> + + This sets output function information for the + given <literal>atttypid</literal> attribute. This function is called once + at the beginning of <literal>COPY TO</literal>. If + this <literal>COPY</literal> handler doesn't use any output functions, this + function doesn't need to do anything. + + <variablelist> + <varlistentry> + <term><literal>CopyToState *cstate</literal></term> + <listitem> + <para> + This is an internal struct that contains all the state variables used + throughout a <literal>COPY TO</literal> operation. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>Oid atttypid</literal></term> + <listitem> + <para> + This is the OID of data type used by the relation's attribute. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>FmgrInfo *finfo</literal></term> + <listitem> + <para> + This can be optionally filled to provide the catalog information of + the output function. + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + + <para> +<programlisting> +void +CopyToStart(CopyToState cstate, + TupleDesc tupDesc); +</programlisting> + + This starts a <literal>COPY TO</literal>. This function is called once at + the beginning of <literal>COPY TO</literal>. + + <variablelist> + <varlistentry> + <term><literal>CopyToState *cstate</literal></term> + <listitem> + <para> + This is an internal struct that contains all the state variables used + throughout a <literal>COPY TO</literal> operation. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>TupleDesc tupDesc</literal></term> + <listitem> + <para> + This is the tuple descriptor of the relation where the data is read. + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + + <para> +<programlisting> +bool +CopyToOneRow(CopyToState cstate, + TupleTableSlot *slot); +</programlisting> + + This writes one row stored in <literal>slot</literal> to the destination. + + <variablelist> + <varlistentry> + <term><literal>CopyToState *cstate</literal></term> + <listitem> + <para> + This is an internal struct that contains all the state variables used + throughout a <literal>COPY TO</literal> operation. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>TupleTableSlot *slot</literal></term> + <listitem> + <para> + This is used to get row to be written. + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + + <para> +<programlisting> +void +CopyToEnd(CopyToState cstate); +</programlisting> + + This ends a <literal>COPY TO</literal>. This function is called once at + the end of <literal>COPY TO</literal>. + + <variablelist> + <varlistentry> + <term><literal>CopyToState *cstate</literal></term> + <listitem> + <para> + This is an internal struct that contains all the state variables used + throughout a <literal>COPY TO</literal> operation. + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + + <para> + TODO: Add CopyToStateFlush()? + </para> + </sect1> +</chapter> diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index fef9584f908..700cf22b502 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -107,6 +107,7 @@ <!ENTITY storage SYSTEM "storage.sgml"> <!ENTITY transaction SYSTEM "xact.sgml"> <!ENTITY tablesample-method SYSTEM "tablesample-method.sgml"> +<!ENTITY copy-handler SYSTEM "copy-handler.sgml"> <!ENTITY wal-for-extensions SYSTEM "wal-for-extensions.sgml"> <!ENTITY generic-wal SYSTEM "generic-wal.sgml"> <!ENTITY custom-rmgr SYSTEM "custom-rmgr.sgml"> diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml index af476c82fcc..8ba319ae2df 100644 --- a/doc/src/sgml/postgres.sgml +++ b/doc/src/sgml/postgres.sgml @@ -254,6 +254,7 @@ break is not needed in a wider output rendering. &plhandler; &fdwhandler; &tablesample-method; + ©-handler; &custom-scan; &geqo; &tableam; diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 500ece7d5bb..24710cb667a 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -28,10 +28,10 @@ typedef struct CopyToRoutine * Set output function information. This callback is called once at the * beginning of COPY TO. * + * 'atttypid' is the OID of data type used by the relation's attribute. + * * 'finfo' can be optionally filled to provide the catalog information of * the output function. - * - * 'atttypid' is the OID of data type used by the relation's attribute. */ void (*CopyToOutFunc) (CopyToState cstate, Oid atttypid, FmgrInfo *finfo); @@ -70,12 +70,13 @@ typedef struct CopyFromRoutine * Set input function information. This callback is called once at the * beginning of COPY FROM. * + * 'atttypid' is the OID of data type used by the relation's attribute. + * * 'finfo' can be optionally filled to provide the catalog information of * the input function. * * 'typioparam' can be optionally filled to define the OID of the type to - * pass to the input function.'atttypid' is the OID of data type used by - * the relation's attribute. + * pass to the input function. */ void (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid, FmgrInfo *finfo, Oid *typioparam); -- 2.47.2