Hi,

In <CAEG8a3+jG_NKOUmcxDyEX2xSggBXReZ4H=e3rfsutedy88a...@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 22 Dec 2023 10:58:05 +0800,
  Junwang Zhao <zhjw...@gmail.com> wrote:

>> 1. Add an opaque space for custom COPY TO handler
>>    * Add CopyToState{Get,Set}Opaque()
>>    
>> https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944
>>
>> 2. Export CopyToState::attnumlist
>>    * Add CopyToStateGetAttNumList()
>>    
>> https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688
>>
>> 3. Export CopySend*()
>>    * Rename CopySend*() to CopyToStateSend*() and export them
>>    * Exception: CopySendEndOfRow() to CopyToStateFlush() because
>>      it just flushes the internal buffer now.
>>    
>> https://github.com/kou/postgres/commit/289a5640135bde6733a1b8e2c412221ad522901e
>>
> I guess the purpose of these helpers is to avoid expose CopyToState to
> copy.h,

Yes.

>         but I
> think expose CopyToState to user might make life easier, users might want to 
> use
> the memory contexts of the structure (though I agree not all the
> fields are necessary
> for extension handers).

OK. I don't object it as I said in another e-mail:
https://www.postgresql.org/message-id/flat/20240110.120644.1876591646729327180.kou%40clear-code.com#d923173e9625c20319750155083cbd72

>> 2. Need an opaque space like IndexScanDesc::opaque does
>>
>>    * A custom COPY TO handler needs to keep its data
> 
> I once thought users might want to parse their own options, maybe this
> is a use case for this opaque space.

Good catch! I forgot to suggest a callback for custom format
options. How about the following API?

----
...
typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem 
*defel);

...
typedef bool (*CopyFromProcessOption_function) (CopyFromState cstate, DefElem 
*defel);

typedef struct CopyToFormatRoutine
{
        ...
        CopyToProcessOption_function process_option_fn;
} CopyToFormatRoutine;

typedef struct CopyFromFormatRoutine
{
        ...
        CopyFromProcessOption_function process_option_fn;
} CopyFromFormatRoutine;
----

----
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e7597894bf..1aa8b62551 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -416,6 +416,7 @@ void
 ProcessCopyOptions(ParseState *pstate,
                                   CopyFormatOptions *opts_out,
                                   bool is_from,
+                                  void *cstate, /* CopyToState* for !is_from, 
CopyFromState* for is_from */
                                   List *options)
 {
        bool            format_specified = false;
@@ -593,11 +594,19 @@ ProcessCopyOptions(ParseState *pstate,
                                                 parser_errposition(pstate, 
defel->location)));
                }
                else
-                       ereport(ERROR,
-                                       (errcode(ERRCODE_SYNTAX_ERROR),
-                                        errmsg("option \"%s\" not recognized",
-                                                       defel->defname),
-                                        parser_errposition(pstate, 
defel->location)));
+               {
+                       bool processed;
+                       if (is_from)
+                               processed = 
opts_out->from_ops->process_option_fn(cstate, defel);
+                       else
+                               processed = 
opts_out->to_ops->process_option_fn(cstate, defel);
+                       if (!processed)
+                               ereport(ERROR,
+                                               (errcode(ERRCODE_SYNTAX_ERROR),
+                                                errmsg("option \"%s\" not 
recognized",
+                                                               defel->defname),
+                                                parser_errposition(pstate, 
defel->location)));
+               }
        }
 
        /*
----

> For the name, I thought private_data might be a better candidate than
> opaque, but I do not insist.

I don't have a strong opinion for this. Here are the number
of headers that use "private_data" and "opaque":

$ grep -r private_data --files-with-matches src/include | wc -l
6
$ grep -r opaque --files-with-matches src/include | wc -l
38

It seems that we use "opaque" than "private_data" in general.

but it seems that we use
"opaque" than "private_data" in our code.

> Do you use the arrow library to control the memory?

Yes.

>                                                     Is there a way that
> we can let the arrow use postgres' memory context?

Yes. Apache Arrow C++ provides a memory pool feature and we
can implement PostgreSQL's memory context based memory pool
for this. (But this is a custom COPY TO/FROM handler's
implementation details.)

>                                                    I'm not sure this
> is necessary, just raise the question for discussion.

Could you clarify what should we discuss? We should require
that COPY TO/FROM handlers should use PostgreSQL's memory
context for all internal memory allocations?

> +PG_FUNCTION_INFO_V1(copy_testfmt_handler);
> +Datum
> +copy_testfmt_handler(PG_FUNCTION_ARGS)
> +{
> + bool is_from = PG_GETARG_BOOL(0);
> + CopyFormatRoutine *cp = makeNode(CopyFormatRoutine);;
> +
> 
> extra semicolon.

I noticed it too :-)
But I ignored it because the current implementation is only
for discussion. We know that it may be dirty.


Thanks,
-- 
kou


Reply via email to