On Fri, Mar 21, 2025 at 5:32 PM David G. Johnston <david.g.johns...@gmail.com> wrote: > > On Tue, Mar 18, 2025 at 7:56 PM Sutou Kouhei <k...@clear-code.com> wrote: >> >> Hi, >> >> In <CAD21AoDU=byrddy8mzcxafg4h9xtetbdm-wvjao1t4ucsec...@mail.gmail.com> >> "Re: Make COPY format extendable: Extract COPY TO format implementations" >> on Mon, 17 Mar 2025 13:50:03 -0700, >> Masahiko Sawada <sawada.m...@gmail.com> wrote: >> >> > I think that built-in formats also need to have their handler >> > functions. This seems to be a conventional way for customizable >> > features such as tablesample and access methods, and we can simplify >> > this function. >> >> OK. 0008 in the attached v37 patch set does it. >> > > tl/dr; > > We need to exclude from our SQL function search any function that doesn't > declare copy_handler as its return type. > ("function text must return type copy_handler" is not an acceptable error > message) > > We need to accept identifiers in FORMAT and parse the optional catalog, > schema, and object name portions. > (Restrict our function search to the named schema if provided.) > > Detail: > > Fun thing...(not sure how much of this is covered above: I do see, but didn't > scour, the security discussion): > > -- create some poison > create function public.text(internal) returns boolean language c as > '/home/davidj/gotya/gotya', 'gotit'; > CREATE FUNCTION > > -- inject it > postgres=# set search_path to public,pg_catalog; > SET > > -- watch it die > postgres=# copy (select 1) to stdout (format text); > ERROR: function text must return type copy_handler > LINE 1: copy (select 1) to stdout (format text); > > I'm especially concerned about extensions here. > > We shouldn't be locating any SQL function that doesn't have a copy_handler > return type. Unfortunately, LookupFuncName seems incapable of doing what we > want here. I suggest we create a new lookup routine where we can specify the > return argument type as a required element. That would cleanly mitigate the > denial-of-service attack/accident vector demonstrated above (the text > returning function should have zero impact on how this feature behaves). If > someone does create a handler SQL function without using copy_handler return > type we'd end up showing "COPY format 'david' not recognized" - a developer > should be able to figure out they didn't put a correct return type on their > handler function and that is why the system did not register it.
Just to be clear, the patch checks the function's return type before calling it: funcargtypes[0] = INTERNALOID; handlerOid = LookupFuncName(list_make1(makeString(format)), 1, funcargtypes, true); if (!OidIsValid(handlerOid)) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("COPY format \"%s\" not recognized", format), parser_errposition(pstate, defel->location))); /* check that handler has correct return type */ if (get_func_rettype(handlerOid) != COPY_HANDLEROID) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("function %s must return type %s", format, "copy_handler"), parser_errposition(pstate, defel->location))); So would changing the error message to like "COPY format 'text' not recognized" untangle your concern? FYI the same is true for TABLESAMPLE; it invokes a function with the specified method name and checks the returned Node type: =# select * from pg_class tablesample text (0); ERROR: function text must return type tsm_handler A difference between TABLESAMPLE and COPY format is that the former accepts a qualified name but the latter doesn't: =# create extension tsm_system_rows ; =# create schema s1; =# create function s1.system_rows(internal) returns void language c as 'tsm_system_rows.so', 'tsm_system_rows_handler'; =# \df *.system_rows List of functions Schema | Name | Result data type | Argument data types | Type --------+-------------+------------------+---------------------+------ public | system_rows | tsm_handler | internal | func s1 | system_rows | void | internal | func (2 rows) postgres(1:1194923)=# select count(*) from pg_class tablesample system_rows(0); count ------- 0 (1 row) postgres(1:1194923)=# select count(*) from pg_class tablesample s1.system_rows(0); ERROR: function s1.system_rows must return type tsm_handler > A second concern is simply people wanting to name things the same; or, why > namespaces were invented. Yeah, I think that the custom COPY format should support qualified names at least. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com