On Fri, May 2, 2025 at 7:20 PM Sutou Kouhei <k...@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoBuEqcz2_+dpA3WTiDUF=fgudpbkwm+nvh+qht-k4p...@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Thu, 1 May 2025 12:15:30 -0700,
>   Masahiko Sawada <sawada.m...@gmail.com> wrote:
>
> > One of the primary considerations we need to address is the treatment
> > of the specified format name. The current patch set utilizes built-in
> > formats (namely 'csv', 'text', and 'binary') when the format name is
> > either unqualified or explicitly specified with 'pg_catalog' as the
> > schema. In all other cases, we search for custom format handler
> > functions based on the search_path. To be frank, I have reservations
> > about this interface design, as the dependence of the specified custom
> > format name on the search_path could potentially confuse users.
>
> How about requiring schema for all custom formats?
>
> Valid:
>
>   COPY ... TO ... (FORMAT 'text');
>   COPY ... TO ... (FORMAT 'my_schema.jsonlines');
>
> Invalid:
>
>   COPY ... TO ... (FORMAT 'jsonlines'); -- no schema
>   COPY ... TO ... (FORMAT 'pg_catalog.text'); -- needless schema
>
> If we require "schema" for all custom formats, we don't need
> to depend on search_path.

I'm concerned that users cannot use the same format name in the FORMAT
option depending on which schema the handler function is created.

>
> > In light of these concerns, I've been contemplating alternative
> > interface designs. One promising approach would involve registering
> > custom copy formats via a C function during module loading
> > (specifically, in _PG_init()). This method would require extension
> > authors to invoke a registration function, say
> > RegisterCustomCopyFormat(), in _PG_init() as follows:
> >
> > JsonLinesFormatId = RegisterCustomCopyFormat("jsonlines",
> >                                              &JsonLinesCopyToRoutine,
> >                                              &JsonLinesCopyFromRoutine);
> >
> > The registration function would validate the format name and store it
> > in TopMemoryContext. It would then return a unique identifier that can
> > be used subsequently to reference the custom copy format extension.
>
> I don't object the suggested interface because I don't have
> a strong opinion how to implement this feature.
>
> Why do we need to assign a unique ID? For performance? For
> RegisterCustomCopyFormatOption()?

I think it's required for monitoring purposes for example. For
instance, we can set the format ID in the progress information and the
progress view can fetch the format name by the ID so that users can
see what format is being used in the COPY command.

>
> >          Extensions could register their own options within this
> > framework, for example:
> >
> > RegisterCustomCopyFormatOption(JsonLinesFormatId,
> >     "custom_option",
> >     custom_option_handler);
>
> Can we defer to discuss how to add support for custom
> options while we focus on the first implementation? Earlier
> patch sets with the current approach had custom options
> support but it's removed in the first implementation.

I think we can skip the custom option patch for the first
implementation but still need to discuss how we will be able to
implement it to understand the big picture of this feature. Otherwise
we could end up going the wrong direction.

>
> (BTW, I think that it's not a good API because we want COPY
> FROM only options and COPY TO only options something like
> "compression level".)

Why does this matter in terms of API? I think that even with this API
we can pass is_from to the option handler function so that it
validates the option based on it.

>
> > This approach offers several advantages: it would eliminate the
> > search_path issue, provide greater flexibility, and potentially
> > simplify the overall interface for users and developers alike.
>
> What contributes to the "flexibility"? Developers can call
> multiple Register* functions in _PG_Init(), right?

I think that with a tablesample-like approach we need to do everything
based on one handler function and callbacks returned from it whereas
there is no such limitation with C API style.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


Reply via email to