On Thu, May 22, 2025 at 4:05 PM Richard Guo <guofengli...@gmail.com> wrote: > Therefore, I'm thinking that maybe we could create a new RelOptInfo > for the RHS rel to represent its unique-ified version, and then > generate all worthwhile paths for it, similar to how it's done in > create_distinct_paths(). Since this is likely to be called repeatedly > on the same rel, we can cache the new RelOptInfo in the rel struct, > just like how we cache cheapest_unique_path currently. > > To be concrete, I'm envisioning something like the following: > > if (bms_equal(sjinfo->syn_righthand, rel2->relids) && > - create_unique_path(root, rel2, rel2->cheapest_total_path, > - sjinfo) != NULL) > + (rel2_unique = create_unique_rel(root, rel2, sjinfo)) != NULL) > > ... > > - add_paths_to_joinrel(root, joinrel, rel1, rel2, > - JOIN_UNIQUE_INNER, sjinfo, > + add_paths_to_joinrel(root, joinrel, rel1, rel2_unique, > + JOIN_INNER, sjinfo, > restrictlist); > - add_paths_to_joinrel(root, joinrel, rel2, rel1, > - JOIN_UNIQUE_OUTER, sjinfo, > + add_paths_to_joinrel(root, joinrel, rel2_unique, rel1, > + JOIN_INNER, sjinfo, > restrictlist);
Here is a WIP draft patch based on this idea. It retains JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER to help determine whether the inner relation is provably unique, but otherwise removes most of the code related to these two join types. Additionally, the T_Unique path now has the same meaning for both semijoins and DISTINCT clauses: it represents adjacent-duplicate removal on presorted input. This patch unifies their handling by sharing the same data structures and functions. There are a few plan diffs in the regression tests. As far as I can tell, the changes are improvements. One of them is caused by the fact that we now consider parameterized paths in unique-ified cases. The rest are mostly a result of now preserving pathkeys for unique paths. This patch is still a work in progress. Before investing too much time into it, I'd like to get some feedback on whether it's heading in the right direction. Thanks Richard
v1-0001-Pathify-RHS-unique-ification-for-semijoin-plannin.patch
Description: Binary data