Hi, thanks for your reply. But this won't be consistent even for non-parallel plans. If we do not use the distributed law of parallel join, it seems OK.
If we generate a parallel plan using the distributed law of the join, then this transformation's pre-assumption might be broken. Currently, we don't consider volatile functions as parallel-safe by default. I run the SQL in pg12: zlv=# select count(proname) from pg_proc where provolatile = 'v' and proparallel ='s'; count ------- 100 (1 row) zlv=# select proname from pg_proc where provolatile = 'v' and proparallel ='s'; proname ---------------------------------------- timeofday bthandler hashhandler gisthandler ginhandler spghandler brinhandler It seems there are many functions which is both volatile and parallel safe. ________________________________ From: Amit Kapila <amit.kapil...@gmail.com> Sent: Thursday, July 16, 2020 12:07 PM To: Zhenghua Lyu <z...@vmware.com> Cc: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org> Subject: Re: Volatile Functions in Parallel Plans On Wed, Jul 15, 2020 at 6:14 PM Zhenghua Lyu <z...@vmware.com> wrote: > > > The first plan: > > Finalize Aggregate > -> Gather > Workers Planned: 2 > -> Partial Aggregate > -> Nested Loop > Join Filter: (t3.c1 = t4.c1) > -> Parallel Seq Scan on t3 > Filter: (c1 ~~ '%sss'::text) > -> Seq Scan on t4 > Filter: (timeofday() = c1) > > The join's left tree is parallel scan and the right tree is seq scan. > This algorithm is correct using the distribute distributive law of > distributed join: > A = [A1 A2 A3...An], B then A join B = gather( (A1 join B) (A2 join B) > ... (An join B) ) > > The correctness of the above law should have a pre-assumption: > The data set of B is the same in each join: (A1 join B) (A2 join B) ... > (An join B) > > But things get complicated when volatile functions come in. Timeofday is just > an example to show the idea. The core is volatile functions can return > different > results on successive calls with the same arguments. Thus the following piece, > the right tree of the join > -> Seq Scan on t4 > Filter: (timeofday() = c1) > can not be considered consistent everywhere in the scan workers. > But this won't be consistent even for non-parallel plans. I mean to say for each loop of join the "Seq Scan on t4" would give different results. Currently, we don't consider volatile functions as parallel-safe by default. -- With Regards, Amit Kapila. EnterpriseDB: https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.enterprisedb.com%2F&data=02%7C01%7Czlyu%40vmware.com%7C825aa0c2259c4da0112008d8293dcd1c%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637304692698598521&sdata=LWZnJ43KQML3EBwB2DoPGE0KHA2t6A3%2FIS9KSLx%2Bcn4%3D&reserved=0