Hi,

In <cad21aobrstmpydai_qvr-xoe7pl722dazm70a+fpvgy2hfs...@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 9 May 2025 17:57:35 -0700,
  Masahiko Sawada <sawada.m...@gmail.com> wrote:

>> Proposed approaches to register custom COPY formats:
>> a. Create a function that has the same name of custom COPY
>>    format
>> b. Call a register function from _PG_init()
>>
>> FYI: I proposed c. approach that uses a. but it always
>> requires schema name for format name in other e-mail.
> 
> With approach (c), do you mean that we require users to change all
> FORMAT option values like from 'text' to 'pg_catalog.text' after the
> upgrade? Or are we exempt the built-in formats?

The latter. 'text' must be accepted because existing pg_dump
results use 'text'. If we reject 'text', it's a big
incompatibility. (We can't dump on old PostgreSQL and
restore to new PostgreSQL.)


>> Users can register the same format name:
>> a. Yes
>>    * Users can distinct the same format name by schema name
>>    * If format name doesn't have schema name, the used
>>      format depends on search_path
>>      * Pros:
>>        * Using schema for it is consistent with other
>>          PostgreSQL mechanisms
>>        * Custom format never conflict with built-in
>>          format. For example, an extension register "xml" and
>>          PostgreSQL adds "xml" later, they are never
>>          conflicted because PostgreSQL's "xml" is registered
>>          to pg_catalog.
>>      * Cons: Different format may be used with the same
>>        input. For example, "jsonlines" may choose
>>        "jsonlines" implemented by extension X or implemented
>>        by extension Y when search_path is different.
>> b. No
>>    * Users can use "${schema}.${name}" for format name
>>      that mimics PostgreSQL's builtin schema (but it's just
>>      a string)
>>
>>
>> Built-in formats (text/csv/binary) should be able to
>> overwritten by extensions:
>> a. (The current patch is no but David's answer is) Yes
>>    * Pros: Users can use drop-in replacement faster
>>      implementation without changing input
>>    * Cons: Users may overwrite them accidentally.
>>      It may break pg_dump result.
>>      (This is called as "backward incompatibility.")
>> b. No
> 
> The summary matches my understanding. I think the second point is
> important. If we go with a tablesample-like API, I agree with David's
> point that all FORMAT values including the built-in formats should
> depend on the search_path value. While it provides a similar user
> experience to other database objects, there is a possibility that a
> COPY with built-in format could work differently on v19 than v18 or
> earlier depending on the search_path value.

Thanks for sharing additional points.

David said that the additional point case is a
responsibility or DBA not PostgreSQL, right?


As I already said, I don't have a strong opinion on which
approach is better. My opinion for the (important) second
point is no. I feel that the pros of a. isn't realistic. If
users want to improve text/csv/binary performance (or
something), they should improve PostgreSQL itself instead of
replacing it as an extension. (Or they should create another
custom copy format such as "faster_text" not "text".)


So I'm OK with the approach b.

>> Are there any missing or wrong items?
> 
> I think the approach (b) provides more flexibility than (a) in terms
> of API design as with (a) we need to do everything based on one
> handler function and callbacks.

Thanks for sharing this missing point.

I have a concern that the flexibility may introduce needless
complexity. If it's not a real concern, I'm OK with the
approach b.


>> If we can summarize
>> the current discussion here correctly, others will be able
>> to chime in this discussion. (At least I can do it.)
> 
> +1

Are there any more people who are interested in custom COPY
FORMAT implementation design? If no more people, let's
decide it by us.


Thanks,
-- 
kou


Reply via email to