On Tue, Jun 24, 2025 at 2:11 PM Sutou Kouhei <k...@clear-code.com> wrote: > > Hi, > > In <cad21aoa57owo6qytptxotcjdmcuj1tgl1ags95cvenolqvw...@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" > on Tue, 24 Jun 2025 11:59:17 +0900, > Masahiko Sawada <sawada.m...@gmail.com> wrote: > > >> 1. This provides 2 registration APIs > >> (RegisterCopy{From,To}Routine(name, routine)) instead of > >> 1 registration API (RegisterCopyFormat(name, > >> from_routine, to_routine)). > >> > >> It's for simple implementation and easy to extend without > >> breaking APIs in the future. (And some formats may > >> provide only FROM routine or TO routine.) > >> > >> Is this design acceptable? > > > > With the single registration API idea, we can register the custom > > format routine that supports only FROM routine using the API like: > > > > RegisterCopyRoutine("new-format", NewFormatFromRoutine, NULL); > > > > Compared to this approach, what points do you think having separate > > two registration APIs is preferable in terms of extendability and API > > compatibility? > > It's natural to add more related APIs with this > approach. The single registration API provides one feature > by one operation. If we use the RegisterCopyRoutine() for > FROM and TO formats API, it's not natural that we add more > related APIs. In this case, some APIs may provide multiple > features by one operation and other APIs may provide single > feature by one operation. Developers may be confused with > the API. For example, developers may think "what does mean > NULL here?" or "can we use NULL here?" for > "RegisterCopyRoutine("new-format", NewFormatFromRoutine, > NULL)".
We can document it in the comment for the registration function. > > > > I think it would be rather confusing for example if > > each COPY TO routine and COPY FROM routine is registered by different > > extensions with the same format name. > > Hmm, I don't think so. Who is confused by the case? DBA? > Users who use COPY? Why is it confused? > > Do you assume the case that the same name is used for > different format? For example, "json" is used for JSON Lines > for COPY FROM and and normal JSON for COPY TO by different > extensions. Suppose that extension-A implements only CopyToRoutine for the custom-format-X with the format name 'myformat' and extension-B implements only CopyFromRoutine for the custom-format-Y with the same name, if users load both extension-A and extension-B, it seems to me that extension-A registers the custom-format-X format as 'myformat' only with CopyToRoutine, and extension-B overwrites the 'myformat' registration by adding custom-format-Y's CopyFromRoutine. However, if users register extension-C that implements both routines with the format name 'myformat', they can register neither extension-A nor extension-B, which seems to me that we don't allow overwriting the registration in this case. I think the core issue appears to be the internal management of custom format entries but the current patch does enable registration overwriting in the former case (extension-A and extension-B case). > > >> FYI: RegisterCopy{From,To}Routine() uses the same logic > >> as RegisterExtensionExplainOption(). > > > > I'm concerned that the patch has duplicated logics for the > > registration of COPY FROM and COPY TO. > > We can implement a convenient routine that can be used for > RegisterExtensionExplainOption() and > RegisterCopy{From,To}Routine() if it's needed. I meant there are duplicated codes in COPY FROM and COPY TO. For instance, RegisterCopyFromRoutine() and RegisterCopyToRoutine() have the same logic. > > >> 3. I want to register the built-in COPY {FROM,TO} routines > >> in the PostgreSQL initialization phase. Where should we > >> do it? In 0002, it's done in InitPostgres() but I'm not > >> sure whether it's a suitable location or not. > > > > InitPostgres() is not a correct function as it's a process > > initialization function. Probably we don't necessarily need to > > register the built-in formats in the same way as custom formats. A > > simple solution would be to have separate arrays for built-in formats > > and custom formats and have the GetCopy[To|From]Routine() search both > > arrays (built-in array first). > > We had a discussion that we should dog-food APIs: > > https://www.postgresql.org/message-id/flat/CAKFQuwaCHhrS%2BRE4p_OO6d7WEskd9b86-2cYcvChNkrP%2B7PJ7A%40mail.gmail.com#e6d1cdd04dac53eafe34b784ac47b68b > > > We should (and usually do) dog-food APIs when reasonable > > and this situation seems quite reasonable. > > In this case, we don't need to dog-food APIs, right? Yes, I think so. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com