Hi, In <CAKFQuwbhSssKTJyeYo9rn20zffV3L7wdQSbEQ8zwRfC=uxl...@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 31 Mar 2025 10:05:34 -0700, "David G. Johnston" <david.g.johns...@gmail.com> wrote:
> The CopyFromInFunc API allows for each attribute to somehow > have its I/O format individualized. But I don't see how that is practical > or useful, and it adds burden on API users. If an extension want to use I/O routines, it can use the CopyFromInFunc API. Otherwise it can provide an empty function. For example, https://github.com/MasahikoSawada/pg_copy_jsonlines/blob/master/copy_jsonlines.c uses the CopyFromInFunc API but https://github.com/kou/pg-copy-arrow/blob/main/copy_arrow.cc uses an empty function for the CopyFromInFunc API. The "it adds burden" means that "defining an empty function is inconvenient", right? See also our past discussion for this design: https://www.postgresql.org/message-id/ZbijVn9_51mljMAG%40paquier.xyz > Keeping empty options does not strike as a bad idea, because this > forces extension developers to think about this code path rather than > just ignore it. > I suggest we remove both .CopyFromInFunc and .CopyFromStart/End and add a > property to CopyFromRoutine (.ioMode?) with values of either Copy_IO_Text > or Copy_IO_Binary and then just branch to either: > > CopyFromTextLikeInFunc & CopyFromTextLikeStart/End > or > CopyFromBinaryInFunc & CopyFromStart/End > > So, in effect, the only method an extension needs to write is converting > to/from the 'serialized' form to the text/binary form (text being near > unanimous). I object this API. If we choose this API, we can create only custom COPY formats that compatible with PostgreSQL's text/binary form. For example, the above jsonlines format and Apache Arrow format aren't implemented. It's meaningless to introduce this custom COPY format mechanism with the suggested API. > It seems to me that CopyFromOneRow could simply produce a *string > collection, > one cell per attribute, and NextCopyFrom could do all of the above on a > for-loop over *string You suggest that we use a string collection instead of a Datum collection in CopyFromOneRow() and convert a string collection to a Datum collection in NextCopyFrom(), right? I object this API. Because it has needless string <-> Datum conversion overhead. For example, https://github.com/MasahikoSawada/pg_copy_jsonlines/blob/master/copy_jsonlines.c parses a JSON value to Datum. If we use this API, we need to convert parsed Datum to string in an extension and NextCopyFrom() re-converts the converted string to Datum. It will slow down custom COPY format. I want this custom COPY format feature for performance. So APIs that require needless overhead for non text/csv/binary formats isn't acceptable to me. Thanks, -- kou