On Fri, Mar 8, 2024 at 11:31 AM David Rowley <dgrowle...@gmail.com> wrote:
> On Fri, 8 Mar 2024 at 00:39, Richard Guo <guofengli...@gmail.com> wrote: > > I would like to have another look, but it might take several days. > > Would that be too late? > > Please look. Several days is fine. I'd like to start looking on Monday > or Tuesday next week. I've had another look, and here are some comments that came to my mind. * There are cases where the setop_pathkeys of a subquery does not match the union_pathkeys generated in generate_union_paths() for sorting by the whole target list. In such cases, even if we have explicitly sorted the paths of the subquery to meet the ordering required for its setop_pathkeys, we cannot find appropriate ordered paths for MergeAppend. For instance, explain (costs off) (select a, b from t t1 where a = b) UNION (select a, b from t t2 where a = b); QUERY PLAN ----------------------------------------------------------- Unique -> Sort Sort Key: t1.a, t1.b -> Append -> Index Only Scan using t_a_b_idx on t t1 Filter: (a = b) -> Index Only Scan using t_a_b_idx on t t2 Filter: (a = b) (8 rows) (Assume t_a_b_idx is a btree index on t(a, b)) In this query, the setop_pathkeys of the subqueries includes only one PathKey because 'a' and 'b' are in the same EC inside the subqueries, while the union_pathkeys of the whole query includes two PathKeys, one for each target entry. After we convert the setop_pathkeys to outer representation, we'd notice that it does not match union_pathkeys. Consequently, we are unable to recognize that the index scan paths are already appropriately sorted, leading us to miss the opportunity to utilize MergeAppend. Not sure if this case is common enough to be worth paying attention to. * In build_setop_child_paths() we also create properly sorted partial paths, which seems not necessary because we do not support parallel merge append, right? * Another is minor and relates to cosmetic matters. When we unique-ify the result of a UNION, we take the number of distinct groups as equal to the total input size. For the Append path and Gather path, we use 'dNumGroups', which is 'rows' of the Append path. For the MergeAppend we use 'rows' of the MergeAppend path. I believe they are supposed to be the same, but I think it'd be better to keep them consistent: either use 'dNumGroups' for all the three kinds of paths, or use 'path->rows' for each path. Thanks Richard