Hi, In <CAD21AoBuEqcz2_+dpA3WTiDUF=fgudpbkwm+nvh+qht-k4p...@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 1 May 2025 12:15:30 -0700, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> One of the primary considerations we need to address is the treatment > of the specified format name. The current patch set utilizes built-in > formats (namely 'csv', 'text', and 'binary') when the format name is > either unqualified or explicitly specified with 'pg_catalog' as the > schema. In all other cases, we search for custom format handler > functions based on the search_path. To be frank, I have reservations > about this interface design, as the dependence of the specified custom > format name on the search_path could potentially confuse users. How about requiring schema for all custom formats? Valid: COPY ... TO ... (FORMAT 'text'); COPY ... TO ... (FORMAT 'my_schema.jsonlines'); Invalid: COPY ... TO ... (FORMAT 'jsonlines'); -- no schema COPY ... TO ... (FORMAT 'pg_catalog.text'); -- needless schema If we require "schema" for all custom formats, we don't need to depend on search_path. > In light of these concerns, I've been contemplating alternative > interface designs. One promising approach would involve registering > custom copy formats via a C function during module loading > (specifically, in _PG_init()). This method would require extension > authors to invoke a registration function, say > RegisterCustomCopyFormat(), in _PG_init() as follows: > > JsonLinesFormatId = RegisterCustomCopyFormat("jsonlines", > &JsonLinesCopyToRoutine, > &JsonLinesCopyFromRoutine); > > The registration function would validate the format name and store it > in TopMemoryContext. It would then return a unique identifier that can > be used subsequently to reference the custom copy format extension. I don't object the suggested interface because I don't have a strong opinion how to implement this feature. Why do we need to assign a unique ID? For performance? For RegisterCustomCopyFormatOption()? I think that we don't need to use it so much in COPY. We don't need to use format name and assigned ID after we retrieve a corresponding Copy{To,From}Routine. Because all needed information are in Copy{To,From}Routine. > Extensions could register their own options within this > framework, for example: > > RegisterCustomCopyFormatOption(JsonLinesFormatId, > "custom_option", > custom_option_handler); Can we defer to discuss how to add support for custom options while we focus on the first implementation? Earlier patch sets with the current approach had custom options support but it's removed in the first implementation. (BTW, I think that it's not a good API because we want COPY FROM only options and COPY TO only options something like "compression level".) > This approach offers several advantages: it would eliminate the > search_path issue, provide greater flexibility, and potentially > simplify the overall interface for users and developers alike. What contributes to the "flexibility"? Developers can call multiple Register* functions in _PG_Init(), right? Thanks, -- kou