Hi, In <CAEG8a3+jG_NKOUmcxDyEX2xSggBXReZ4H=e3rfsutedy88a...@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 22 Dec 2023 10:58:05 +0800, Junwang Zhao <zhjw...@gmail.com> wrote:
>> 1. Add an opaque space for custom COPY TO handler >> * Add CopyToState{Get,Set}Opaque() >> >> https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944 >> >> 2. Export CopyToState::attnumlist >> * Add CopyToStateGetAttNumList() >> >> https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688 >> >> 3. Export CopySend*() >> * Rename CopySend*() to CopyToStateSend*() and export them >> * Exception: CopySendEndOfRow() to CopyToStateFlush() because >> it just flushes the internal buffer now. >> >> https://github.com/kou/postgres/commit/289a5640135bde6733a1b8e2c412221ad522901e >> > I guess the purpose of these helpers is to avoid expose CopyToState to > copy.h, Yes. > but I > think expose CopyToState to user might make life easier, users might want to > use > the memory contexts of the structure (though I agree not all the > fields are necessary > for extension handers). OK. I don't object it as I said in another e-mail: https://www.postgresql.org/message-id/flat/20240110.120644.1876591646729327180.kou%40clear-code.com#d923173e9625c20319750155083cbd72 >> 2. Need an opaque space like IndexScanDesc::opaque does >> >> * A custom COPY TO handler needs to keep its data > > I once thought users might want to parse their own options, maybe this > is a use case for this opaque space. Good catch! I forgot to suggest a callback for custom format options. How about the following API? ---- ... typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem *defel); ... typedef bool (*CopyFromProcessOption_function) (CopyFromState cstate, DefElem *defel); typedef struct CopyToFormatRoutine { ... CopyToProcessOption_function process_option_fn; } CopyToFormatRoutine; typedef struct CopyFromFormatRoutine { ... CopyFromProcessOption_function process_option_fn; } CopyFromFormatRoutine; ---- ---- diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index e7597894bf..1aa8b62551 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -416,6 +416,7 @@ void ProcessCopyOptions(ParseState *pstate, CopyFormatOptions *opts_out, bool is_from, + void *cstate, /* CopyToState* for !is_from, CopyFromState* for is_from */ List *options) { bool format_specified = false; @@ -593,11 +594,19 @@ ProcessCopyOptions(ParseState *pstate, parser_errposition(pstate, defel->location))); } else - ereport(ERROR, - (errcode(ERRCODE_SYNTAX_ERROR), - errmsg("option \"%s\" not recognized", - defel->defname), - parser_errposition(pstate, defel->location))); + { + bool processed; + if (is_from) + processed = opts_out->from_ops->process_option_fn(cstate, defel); + else + processed = opts_out->to_ops->process_option_fn(cstate, defel); + if (!processed) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("option \"%s\" not recognized", + defel->defname), + parser_errposition(pstate, defel->location))); + } } /* ---- > For the name, I thought private_data might be a better candidate than > opaque, but I do not insist. I don't have a strong opinion for this. Here are the number of headers that use "private_data" and "opaque": $ grep -r private_data --files-with-matches src/include | wc -l 6 $ grep -r opaque --files-with-matches src/include | wc -l 38 It seems that we use "opaque" than "private_data" in general. but it seems that we use "opaque" than "private_data" in our code. > Do you use the arrow library to control the memory? Yes. > Is there a way that > we can let the arrow use postgres' memory context? Yes. Apache Arrow C++ provides a memory pool feature and we can implement PostgreSQL's memory context based memory pool for this. (But this is a custom COPY TO/FROM handler's implementation details.) > I'm not sure this > is necessary, just raise the question for discussion. Could you clarify what should we discuss? We should require that COPY TO/FROM handlers should use PostgreSQL's memory context for all internal memory allocations? > +PG_FUNCTION_INFO_V1(copy_testfmt_handler); > +Datum > +copy_testfmt_handler(PG_FUNCTION_ARGS) > +{ > + bool is_from = PG_GETARG_BOOL(0); > + CopyFormatRoutine *cp = makeNode(CopyFormatRoutine);; > + > > extra semicolon. I noticed it too :-) But I ignored it because the current implementation is only for discussion. We know that it may be dirty. Thanks, -- kou