On Tuesday, June 8, 2021 10:51 PM Robert Haas <robertmh...@gmail.com> > On Mon, Jun 7, 2021 at 11:33 PM Amit Kapila <amit.kapil...@gmail.com> > wrote: > > Note the error is raised after applying the patch, without the patch, > > the above won't show any error (error message could be improved here). > > Such cases can lead to unpredictable behavior without a patch because > > we won't be able to detect the execution of parallel-unsafe functions. > > There are similar examples from regression tests. Now, one way to deal > > with similar cases could be that we document them and say we don't > > consider parallel-safety in such cases and the other way is to detect > > such cases and error out. Yet another way could be that we somehow try > > to check these cases as well before enabling parallelism but I thought > > these cases fall in the similar category as aggregate's support > > functions. > > I'm not very excited about the idea of checking type input and type output > functions. It's hard to imagine someone wanting to do something > parallel-unsafe in such a function, unless they're just trying to prove a > point. So > I don't think checking it would be a good investment of CPU cycles. If we do > anything at all, I'd vote for just documenting that such functions should be > parallel-safe and that their parallel-safety marks are not checked when they > are > used as type input/output functions. Perhaps we ought to document the same > thing with regard to opclass support functions, another place where it's hard > to > imagine a realistic use case for doing something parallel-unsafe. > > In the case of aggregates, I see the issues slightly differently. I don't > know that > it's super-likely that someone would want to create a parallel-unsafe > aggregate function, but I think there should be a way to do it, just in case. > However, if somebody wants that, they can just mark the aggregate itself > unsafe. There's no benefit for the user to marking the aggregate safe and the > support functions unsafe and hoping that the system figures it out somehow. > > In my opinion, you're basically taking too pure a view of this. We're not > trying to > create a system that does such a good job checking parallel safety markings > that nobody can possibly find a thing that isn't checked no matter how hard > they poke around the dark corners of the system. Or at least we shouldn't be > trying to do that. We should be trying to create a system that works well in > practice, and gives people the flexibility to easily avoid parallelism when > they > have a query that is parallel-unsafe, while still getting the benefit of > parallelism > the rest of the time. > > I don't know what all the cases you've uncovered are, and maybe there's > something in there that I'd be more excited about changing if I knew what it > was, but the particular problems you're mentioning here seem more > theoretical than real to me.
I think another case that parallel unsafe function could be invoked in parallel mode is the TEXT SEARCH TEMPLATE's init_function or lexize_function. Because currently, the planner does not check the safety of these function. Please see the example below[1] I am not sure will user use parallel unsafe function in init_function or lexize_function, but if user does, it could cause unexpected result. Does it make sense to add some check for init_function or lexize_function or document this together with type input/output and opclass support functions ? [1]----------------------------EXAMPLE------------------------------------ CREATE FUNCTION dsnowball_init(INTERNAL) RETURNS INTERNAL AS '$libdir/dict_snowball', 'dsnowball_init' LANGUAGE C STRICT; CREATE FUNCTION dsnowball_lexize(INTERNAL, INTERNAL, INTERNAL, INTERNAL) RETURNS INTERNAL AS '$libdir/dict_snowball', 'dsnowball_lexize' LANGUAGE C STRICT; CREATE TEXT SEARCH TEMPLATE snowball (INIT = dsnowball_init, LEXIZE = dsnowball_lexize); COMMENT ON TEXT SEARCH TEMPLATE snowball IS 'snowball stemmer'; create table pendtest (ts tsvector); create index pendtest_idx on pendtest using gin(ts); insert into pendtest select (to_tsvector('Lore ipsum')) from generate_series(1,10000000,1); analyze; set enable_bitmapscan = off; postgres=# explain select * from pendtest where to_tsquery('345&qwerty') @@ ts; QUERY PLAN -------------------------------------------------------------------------------- Gather (cost=1000.00..1168292.86 rows=250 width=31) Workers Planned: 2 -> Parallel Seq Scan on pendtest (cost=0.00..1167267.86 rows=104 width=31) Filter: (to_tsquery('345&qwerty'::text) @@ ts) -- In the example above, dsnowball_init() and dsnowball_lexize() will be executed in parallel mode. ----------------------------EXAMPLE------------------------------------ Best regards, houzj