On Tue, Mar 19, 2024 at 8:18 AM Richard Guo <guofengli...@gmail.com> wrote:
> (Sorry it takes me some time to get back to this thread.) > > On Thu, Mar 7, 2024 at 7:13 PM Ashutosh Bapat < > ashutosh.bapat....@gmail.com> wrote: > >> I did a deeper review of the patch. Here are some comments >> > > Thank you for the review! > > >> Approach >> -------- >> The equijoin condition between partition keys doesn't appear in the >> join's restrictilist because of 'best_score' strategy as you explained well >> in [2]. What if we add an extra score for clauses between partition keys >> and give preference to equijoin between partition keys? Have you given it a >> thought? I feel that having an equijoin clause involving partition keys has >> more usages compared to a clause with any random column. E.g. nextloop may >> be able to prune partitions from inner relation if the clause contains a >> partition key. >> > > Hmm, I think this approach won't work in cases where one certain pair of > partition keys has formed an EC that contains pseudoconstants. In such > cases, the EC machinery will generate restriction clauses like 'pk = > const' rather than any join clauses. > That should be ok and more desirable. Clauses like pk = const will leave only one partition around in each of the joining relations thus PWJ won't be required OR it will be automatic - whichever way you see it. > > Besides, it seems to me that it's not a cheap operation to check whether > a join clause is between partition keys when we generate join clauses > from ECs in generate_join_implied_equalities(). > Why? The code would be the same as what we have in have_partkey_equi_join(). > > > Documentation >> ------------- >> The patch does not modify any documentation. The only documentation I >> could find about partitionwise join is the one for GUC >> 'enable_partitionwise_join'. It says >> --- quote >> "Partitionwise join currently applies only when the join conditions >> include all the partition keys, which must be of the same data type and >> have one-to-one matching sets of child partitions.". >> --- unquote >> This sentence is general and IMO covers the case this patch considers. >> But in general I feel that partitionwise join and aggregation deserve >> separate sections next to "partition pruning" in [1]; It should mention >> advanced partition matching algorithm as well. Would you be willing to >> write one and then expand it for the case in the patch? >> > > I don't think it should be part of this patch to add a new section in > the docs to explain partitionwise join and aggregation. Maybe that > deserves a separate patch? > Yes. 3. The testcase added looks artificial. it outputs data that has same value >> for two columns which is equal to the primary key of the other table - when >> would somebody do that?. Is there some real life example where this change >> will be useful? >> > > Hmm, I think the test case is good as long as it reveals the issue that > this patch fixes. It follows the same format as the existing test case > just above it. I'm not sure if there are real life examples, but I > think it may not always be necessary to derive test cases from them. > Let's defer this to the committer. > > >> Code >> ---- >> Minor comment for now. It will be better to increment num_equal_pks >> immediately after setting pk_known_equal[ipk] = true. Otherwise the code >> gets confusing around line 2269. I will spend more time reviewing the code >> next week. >> > > Hmm, the increment of num_equal_pks on line 2272 is parallel to the one > in the first loop (around line 2200). Maybe it's better to keep them > consistent as the current patch does? > > In the first loop, setting pk_known_equal[ipk1] = true and ++num_equal_pks happens on consecutive lines. That's not true in the second loop, where there are at least some code line where num_equal_pks is inconsistent with the number of "true" entries in pk_known_equal. We should avoid that. -- Best Wishes, Ashutosh Bapat