On Tue, Aug 15, 2017 at 7:16 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Robert Haas <robertmh...@gmail.com> writes: >> On Tue, Aug 15, 2017 at 7:31 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: >>> Attached patch fixes the issue for me. I have locally verified that >>> the gather merge gets executed in rescan path. I haven't added a test >>> case for the same as having gather or gather merge on the inner side >>> of join can be time-consuming. However, if you or others feel that it >>> is important to have a test to cover this code path, then I can try to >>> produce one. > >> Committed. > >> I believe that between this commit and the test-coverage commit from >> Andres, this open item is reasonably well addressed. If someone >> thinks more needs to be done, please specify. Thanks. > > How big a deal do we think test coverage is? It looks like > ExecReScanGatherMerge is identical logic to ExecReScanGather, > which *is* covered according to coverage.postgresql.org, but > it wouldn't be too surprising if they diverge in future. > > I should think it wouldn't be that expensive to create a test > case, if you already have test cases that invoke GatherMerge. > Adding a right join against a VALUES clause with a small number of > entries, and a non-mergeable/hashable join clause, ought to do it. >
I have done some experiments based on this idea to generate a test, but I think it is not as straightforward as it appears. Below are some of my experiments: Query that uses GatherMerge in regression tests --------------------------------------------------------------------- regression=# explain (costs off) select string4, count((unique2)) from tenk1 group by string4 order by string4; QUERY PLAN ---------------------------------------------------- Finalize GroupAggregate Group Key: string4 -> Gather Merge Workers Planned: 2 -> Partial GroupAggregate Group Key: string4 -> Sort Sort Key: string4 -> Parallel Seq Scan on tenk1 (9 rows) Modified Query ---------------------- regression=# explain (costs off) select tenk1.hundred, count((unique2)) from tenk1 right join (values (1,100), (2,200)) as t (two, hundred) on t.two > 1 and tenk1.hundred > 1 group by tenk1.hundred order by tenk1.hundred; QUERY PLAN -------------------------------------------------------------------------- Sort Sort Key: tenk1.hundred -> HashAggregate Group Key: tenk1.hundred -> Nested Loop Left Join Join Filter: ("*VALUES*".column1 > 1) -> Values Scan on "*VALUES*" -> Gather Workers Planned: 4 -> Parallel Index Scan using tenk1_hundred on tenk1 Index Cond: (hundred > 1) (11 rows) The cost of GatherMerge is always higher than Gather in this case which is quite obvious as GatherMerge has to perform some additional task. I am not aware of a way such that Grouping and Sorting can be pushed below parallel node (Gather/GatherMerge) in this case, if there is any such possibility, then it might prefer GatherMerge. Another way to make it parallel is, add a new guc enable_gather similar to enable_gathermerge and then set that to off, it will prefer GatherMerge in that case. I think it is anyway good to have such a guc. I will go and do it this way unless you have a better idea. Note - enable_gathermerge is not present in postgresql.conf. I think we should add it in the postgresql.conf.sample file. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers