Hi, In <cad21aobrstmpydai_qvr-xoe7pl722dazm70a+fpvgy2hfs...@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 9 May 2025 17:57:35 -0700, Masahiko Sawada <sawada.m...@gmail.com> wrote:
>> Proposed approaches to register custom COPY formats: >> a. Create a function that has the same name of custom COPY >> format >> b. Call a register function from _PG_init() >> >> FYI: I proposed c. approach that uses a. but it always >> requires schema name for format name in other e-mail. > > With approach (c), do you mean that we require users to change all > FORMAT option values like from 'text' to 'pg_catalog.text' after the > upgrade? Or are we exempt the built-in formats? The latter. 'text' must be accepted because existing pg_dump results use 'text'. If we reject 'text', it's a big incompatibility. (We can't dump on old PostgreSQL and restore to new PostgreSQL.) >> Users can register the same format name: >> a. Yes >> * Users can distinct the same format name by schema name >> * If format name doesn't have schema name, the used >> format depends on search_path >> * Pros: >> * Using schema for it is consistent with other >> PostgreSQL mechanisms >> * Custom format never conflict with built-in >> format. For example, an extension register "xml" and >> PostgreSQL adds "xml" later, they are never >> conflicted because PostgreSQL's "xml" is registered >> to pg_catalog. >> * Cons: Different format may be used with the same >> input. For example, "jsonlines" may choose >> "jsonlines" implemented by extension X or implemented >> by extension Y when search_path is different. >> b. No >> * Users can use "${schema}.${name}" for format name >> that mimics PostgreSQL's builtin schema (but it's just >> a string) >> >> >> Built-in formats (text/csv/binary) should be able to >> overwritten by extensions: >> a. (The current patch is no but David's answer is) Yes >> * Pros: Users can use drop-in replacement faster >> implementation without changing input >> * Cons: Users may overwrite them accidentally. >> It may break pg_dump result. >> (This is called as "backward incompatibility.") >> b. No > > The summary matches my understanding. I think the second point is > important. If we go with a tablesample-like API, I agree with David's > point that all FORMAT values including the built-in formats should > depend on the search_path value. While it provides a similar user > experience to other database objects, there is a possibility that a > COPY with built-in format could work differently on v19 than v18 or > earlier depending on the search_path value. Thanks for sharing additional points. David said that the additional point case is a responsibility or DBA not PostgreSQL, right? As I already said, I don't have a strong opinion on which approach is better. My opinion for the (important) second point is no. I feel that the pros of a. isn't realistic. If users want to improve text/csv/binary performance (or something), they should improve PostgreSQL itself instead of replacing it as an extension. (Or they should create another custom copy format such as "faster_text" not "text".) So I'm OK with the approach b. >> Are there any missing or wrong items? > > I think the approach (b) provides more flexibility than (a) in terms > of API design as with (a) we need to do everything based on one > handler function and callbacks. Thanks for sharing this missing point. I have a concern that the flexibility may introduce needless complexity. If it's not a real concern, I'm OK with the approach b. >> If we can summarize >> the current discussion here correctly, others will be able >> to chime in this discussion. (At least I can do it.) > > +1 Are there any more people who are interested in custom COPY FORMAT implementation design? If no more people, let's decide it by us. Thanks, -- kou