On Thu, Nov 16, 2017 at 3:34 PM, Amit Kapila <amit.kapil...@gmail.com> wrote: > On Wed, Nov 15, 2017 at 12:25 AM, Robert Haas <robertmh...@gmail.com> wrote: >> On Tue, Nov 14, 2017 at 11:00 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: >>> Yeah, I'm sure it is. I have a vague recollection that there might be >>> existing regression test cases exercising such things, though I won't >>> swear to that. The "orderstest" bit in subselect.sql looks like it >>> might be meant to do that... >> > > I agree that such cases can cause a problem with fixed memory. > I'm able to come up with a test case to produce the problem.
regression[107494]=# select ten,count(*) from tenk1 a where a.ten in (select b.ten from tenk1 b where (select a.ten from tenk1 c where c.ten = a.ten limit 1) = b.ten limit 1) group by a.ten; QUERY PLAN --------------------------------------------------- HashAggregate Group Key: a.ten -> Seq Scan on tenk1 a Filter: (SubPlan 2) SubPlan 2 -> Limit InitPlan 1 (returns $1) -> Limit -> Seq Scan on tenk1 c Filter: (ten = a.ten) -> Seq Scan on tenk1 b Filter: ($1 = ten) In the above case, Subplan 2 returns the same value of a.ten that is used in initplan as a reference. So, this query is nothing but select ten,count(*) from tenk1 group by a.ten. The output should be 10 rows each row having a count=1000. By enforcing parallelism, I got the following plan: QUERY PLAN ------------------------------------------------------ HashAggregate Group Key: a.ten -> Seq Scan on tenk1 a Filter: (SubPlan 2) SubPlan 2 -> Limit InitPlan 1 (returns $1) -> Limit -> Seq Scan on tenk1 c Filter: (ten = a.ten) -> Gather Workers Planned: 2 Params Evaluated: $1 -> Parallel Seq Scan on tenk1 b Filter: ($1 = ten) When I set parallel_leader_participation to off, I get the incorrect result as follows: ten | count ------------ 0 | 1000 (1 row) In the above case, the initplan gets evaluated only once from ExecInitParallelPlan path. Hence, we see the incorrect result. >> Here's an updated patch that attempts to work around this problem using DSA. >> > > There were a couple of problems with your changes: > 1. > BufferUsage *buffer_usage; /* points to bufusage area in DSM */ > + dsa_handle param_exec; /* serialized PARAM_EXEC parameters */ > @@ -35,12 +36,13 @@ typedef struct ParallelExecutorInfo > } ParallelExecutorInfo; > > This should be dsa_pointer, otherwise, the value returned by > SerializeParamExecParams will get truncated. > > 2. In ExecParallelReinitialize(), we need to evaluate the params > before serializing them. > > 3. I think we should free the dsa pointer at the end of the gather. > > Attached patch fixes the mentioned problems. > >> It could use a test case that actually tickles the new logic in >> ExecParallelReinitialize ... this is mostly just to show the concept. >> > > Thanks, it was quite helpful. > I've tested the above-mentioned scenario with this patch and it is working fine. Also, I've created a text column named 'vartext', inserted some random length texts(max length 100) and tweaked the above query as follows: select ten,count(*) from tenk1 a where a.ten in (select b.ten from tenk1 b where (select a.vartext from tenk1 c where c.ten = a.ten limit 1) = b.vartext limit 1) group by a.ten; This query is equivalent to select ten,count(*) from tenk1 group by a.ten. It also produced the expected result without throwing any error. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com