On Thu, May 22, 2025 at 4:05 PM Richard Guo <guofengli...@gmail.com> wrote:
> Therefore, I'm thinking that maybe we could create a new RelOptInfo
> for the RHS rel to represent its unique-ified version, and then
> generate all worthwhile paths for it, similar to how it's done in
> create_distinct_paths().  Since this is likely to be called repeatedly
> on the same rel, we can cache the new RelOptInfo in the rel struct,
> just like how we cache cheapest_unique_path currently.
>
> To be concrete, I'm envisioning something like the following:
>
>             if (bms_equal(sjinfo->syn_righthand, rel2->relids) &&
> -               create_unique_path(root, rel2, rel2->cheapest_total_path,
> -                                  sjinfo) != NULL)
> +               (rel2_unique = create_unique_rel(root, rel2, sjinfo)) != NULL)
>
> ...
>
> -               add_paths_to_joinrel(root, joinrel, rel1, rel2,
> -                                    JOIN_UNIQUE_INNER, sjinfo,
> +               add_paths_to_joinrel(root, joinrel, rel1, rel2_unique,
> +                                    JOIN_INNER, sjinfo,
>                                      restrictlist);
> -               add_paths_to_joinrel(root, joinrel, rel2, rel1,
> -                                    JOIN_UNIQUE_OUTER, sjinfo,
> +               add_paths_to_joinrel(root, joinrel, rel2_unique, rel1,
> +                                    JOIN_INNER, sjinfo,
>                                      restrictlist);

Here is a WIP draft patch based on this idea.  It retains
JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER to help determine whether the
inner relation is provably unique, but otherwise removes most of the
code related to these two join types.

Additionally, the T_Unique path now has the same meaning for both
semijoins and DISTINCT clauses: it represents adjacent-duplicate
removal on presorted input.  This patch unifies their handling by
sharing the same data structures and functions.

There are a few plan diffs in the regression tests.  As far as I can
tell, the changes are improvements.  One of them is caused by the fact
that we now consider parameterized paths in unique-ified cases.  The
rest are mostly a result of now preserving pathkeys for unique paths.

This patch is still a work in progress.  Before investing too much
time into it, I'd like to get some feedback on whether it's heading in
the right direction.

Thanks
Richard

Attachment: v1-0001-Pathify-RHS-unique-ification-for-semijoin-plannin.patch
Description: Binary data

Reply via email to