On Wed, May 1, 2024 at 1:31 AM Robert Haas <robertmh...@gmail.com> wrote:
> I think it's slightly questionable whether this patch is worthwhile. > The case memorialized in the regression tests, t1.a = t2.a AND t1.a = > t2.b, is a very weird thing to do. The case mentioned in the original > email, foo.k1 = bar.k1 and foo.k2 = bar.k2 and foo.k2 = 16, seems like > something that could realistically happen, especially when there are > views in use (e.g. the view joins foo and bar, and then someone > queries the view for one of the join columns). In such a case, it's > possible that foo.k2 = 16 is selective enough that we really don't > care about partition-wise join any more, but it's also possible that > it's not too selective and we do care about partition-wise join. So I > don't think that the case that the patch fixes is something that can > ever happen, but I do think it's probably fairly rare that brings any > benefit, which is why I thought that EC-based matching was an OK > approach to this problem initially. Perhaps that was the wrong idea, > though. Thank you for taking the time to review this patch! I think Ashutosh also mentioned that the new added test case looks artificial. I must admit that I'm not too sure how common we encounter queries with partially-redundant join clauses in real-life scenarios. It is possible that such cases are quite rare, and this patch will then not be of much use. I initially brought up this issue because I noticed an inconsistency regarding the generation of a partition-wise join: with 't1.k = t2.k and t1.k = t2.val' we are able to generate a partition-wise join, while its equivalent form 't1.k = t2.val and t1.k = t2.k' does not result in a partition-wise join. I think this inconsistency could be confusing. The reason behind this is that with 't1.k = t2.val and t1.k = t2.k' it happens to constrain other members (t1.k and t2.val) of the EC than the ones we are looking for (t1.k and t2.k). Our current code looks through the join's restriction clauses for matched keys. In addition to that, this patch checks to see if any unmatched keys are known equal by ECs, leveraging function exprs_known_equal(). > Does the additional logic added by this patch have a noticeable > performance cost? I think one concern regarding performance cost is that the function exprs_known_equal() would be called O(N^2) times, where N is the number of partition key expressions. But I think this might not be a problem. The number of a joinrel's partition key expressions would only be equal to the join degree, since each base relation within the join contributes only one partition key expression, according to set_joinrel_partition_key_exprs(). This number would not scale with the number of partitions. But I have not measured the performance in practice by running benchmarks. Maybe I'm just wrong. Thanks Richard