On Fri, May 3, 2024 at 9:31 PM Robert Haas <robertmh...@gmail.com> wrote:
> On Fri, May 3, 2024 at 7:47 AM Richard Guo <guofengli...@gmail.com> wrote: > > I think one concern regarding performance cost is that the function > > exprs_known_equal() would be called O(N^2) times, where N is the number > > of partition key expressions. But I think this might not be a problem. > > The number of a joinrel's partition key expressions would only be equal > > to the join degree, since each base relation within the join contributes > > only one partition key expression, according to > > set_joinrel_partition_key_exprs(). This number would not scale with the > > number of partitions. But I have not measured the performance in > > practice by running benchmarks. Maybe I'm just wrong. > > I don't know, but I do think you should do some benchmarking and see > if you can find cases where this regresses performance. In my opinion, > this feature is worth having only if it's basically free. There's lots > of things we could do in the planner that would give better (perhaps > much better) plans in certain cases, but which we don't do because in > all other cases we'd pay a certain number of CPU cycles to have them > and it just doesn't make sense given how often we'd actually get a > benefit. This might be another such case. Thank you for the suggestion. In order to obtain a rough estimation of how this patch affects planning time, I did the following benchmarking: * create a partitioned table with 3 keys and 1000 partitions, which looks like Partitioned table "public.t1_parted" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- a | integer | | | b | integer | | | c | integer | | | d | integer | | | Partition key: RANGE (a, b, c) Number of partitions: 1000 (Use \d+ to list them.) * compose a query involving 5-way joins of this partitioned table, which looks like: select * from t1_parted t1 natural join t1_parted t2 natural join t1_parted t3 natural join t1_parted t4 natural join t1_parted t5 where t1.b = 1 and t1.c = 2; This query is composed in such a way that it could actually generate partitionwise join, because there exist equi-join condition for each pair of matching partition keys; but currently on master it is not able to generate partitionwise join, because of the filters 't1.b = 1 and t1.c = 2', which is the issue fixed by this patch. * run this query 5 times with enable_partitionwise_join set to on, and collect the average planning time on master and on patched. To ensure fairness, on master, a little hack is required to enable the generation of partitionwise join for this query. This allows us to eliminate any potential impact on planning partitionwise joins and evaluate the effects of this patch accurately. Below is what I got on my local machine. -- on master measurement | average | maximum | minimum | std_dev | std_dev_as_perc_of_avg ---------------+----------+----------+----------+---------+------------------------ planning time | 30355.07 | 33148.47 | 29020.82 | 1681.23 | 5.54% -- on patched measurement | average | maximum | minimum | std_dev | std_dev_as_perc_of_avg ---------------+----------+----------+----------+---------+------------------------ planning time | 30600.00 | 33523.23 | 28680.75 | 1861.90 | 6.08% -- without partitionwise join measurement | average | maximum | minimum | std_dev | std_dev_as_perc_of_avg ---------------+---------+---------+---------+---------+------------------------ planning time | 4840.18 | 5184.05 | 4528.87 | 299.98 | 6.20% So it seems that the planning time is not significantly affected by this patch, particularly when compared to the impact caused by partitionwise join. BTW, I was using Ashutosh's script [1] for setting up the benchmarking. I find the script very handy. [1] https://www.postgresql.org/message-id/flat/CAExHW5s%3DbCLMMq8n_bN6iU%2BPjau0DS3z_6Dn6iLE69ESmsPMJQ%40mail.gmail.com Thanks Richard