On Fri, Sep 30, 2022 at 3:44 PM Zheng Li <zhengl...@gmail.com> wrote:
> Hello, > > A bloom filter provides early filtering of rows that cannot be joined > before they would reach the join operator, the optimization is also > called a semi join filter (SJF) pushdown. Such a filter can be created > when one child of the join operator must materialize its derived table > before the other child is evaluated. > > For example, a bloom filter can be created using the the join keys for > the build side/inner side of a hash join or the outer side of a merge > join, the bloom filter can then be used to pre-filter rows on the > other side of the join operator during the scan of the base relation. > The thread about “Hash Joins vs. Bloom Filters / take 2” [1] is good > discussion on using such optimization for hash join without going into > the pushdown of the filter where its performance gain could be further > increased. > > We worked on prototyping bloom filter pushdown for both hash join and > merge join. Attached is a patch set for bloom filter pushdown for > merge join. We also plan to send the patch for hash join once we have > it rebased. > > Here is a summary of the patch set: > 1. Bloom Filter Pushdown optimizes Merge Join by filtering rows early > during the table scan instead of later on. > -The bloom filter is pushed down along the execution tree to > the target SeqScan nodes. > -Experiments show that this optimization can speed up Merge > Join by up to 36%. > > 2. The planner makes the decision to use the bloom filter based on the > estimated filtering rate and the expected performance gain. > -The planner accomplishes this by estimating four numbers per > variable - the total number of rows of the relation, the number of > distinct values for a given variable, and the minimum and maximum > value of the variable (when applicable). Using these numbers, the > planner estimates a filtering rate of a potential filter. > -Because actually creating and implementing the filter adds > more operations, there is a minimum threshold of filtering where the > filter would actually be useful. Based on testing, we query to see if > the estimated filtering rate is higher than 35%, and that informs our > decision to use a filter or not. > > 3. If using a bloom filter, the planner also adjusts the expected cost > of Merge Join based on expected performance gain. > > 4. Capability to build the bloom filter in parallel in case of > parallel SeqScan. This is done efficiently by populating a local bloom > filter for each parallel worker and then taking a bitwise OR over all > the local bloom filters to form a shared bloom filter at the end of > the parallel SeqScan. > > 5. The optimization is GUC controlled, with settings of > enable_mergejoin_semijoin_filter and force_mergejoin_semijoin_filter. > > We found in experiments that there is a significant improvement > when using the bloom filter during Merge Join. One experiment involved > joining two large tables while varying the theoretical filtering rate > (TFR) between the two tables, the TFR is defined as the percentage > that the two datasets are disjoint. Both tables in the merge join were > the same size. We tested changing the TFR to see the change in > filtering optimization. > > For example, let’s imagine t0 has 10 million rows, which contain the > numbers 1 through 10 million randomly shuffled. Also, t1 has the > numbers 4 million through 14 million randomly shuffled. Then the TFR > for a join of these two tables is 40%, since 40% of the tables are > disjoint from the other table (1 through 4 million for t0, 10 million > through 14 million for t4). > > Here is the performance test result joining two tables: > TFR: theoretical filtering rate > EFR: estimated filtering rate > AFR: actual filtering rate > HJ: hash join > MJ Default: default merge join > MJ Filter: merge join with bloom filter optimization enabled > MJ Filter Forced: merge join with bloom filter optimization forced > > TFR EFR AFR HJ MJ Default MJ Filter MJ Filter Forced > > ------------------------------------------------------------------------------------- > 10 33.46 7.41 6529 22638 21949 23160 > 20 37.27 14.85 6483 22290 21928 21930 > 30 41.32 22.25 6395 22374 20718 20794 > 40 45.67 29.7 6272 21969 19449 19410 > 50 50.41 37.1 6210 21412 18222 18224 > 60 55.64 44.51 6052 21108 17060 17018 > 70 61.59 51.98 5947 21020 15682 15737 > 80 68.64 59.36 5761 20812 14411 14437 > 90 77.83 66.86 5701 20585 13171 13200 > Table. Execution Time (ms) vs Filtering Rate (%) for Joining Two > Tables of 10M Rows. > > Attached you can find figures of the same performance test and a SQL script > to reproduce the performance test. > > The first thing to notice is that Hash Join generally is the most > efficient join strategy. This is because Hash Join is better at > dealing with small tables, and our size of 10 million is still small > enough where Hash Join outperforms the other join strategies. Future > experiments can investigate using much larger tables. > > However, comparing just within the different Merge Join variants, we > see that using the bloom filter greatly improves performance. > Intuitively, all of these execution times follow linear paths. > Comparing forced filtering versus default, we can see that the default > Merge Join outperforms Merge Join with filtering at low filter rates, > but after about 20% TFR, the Merge Join with filtering outperforms > default Merge Join. This makes intuitive sense, as there are some > fixed costs associated with building and checking with the bloom > filter. In the worst case, at only 10% TFR, the bloom filter makes > Merge Join less than 5% slower. However, in the best case, at 90% TFR, > the bloom filter improves Merge Join by 36%. > > Based on the results of the above experiments, we came up with a > linear equation for the performance ratio for using the filter > pushdown from the actual filtering rate. Based on the numbers > presented in the figure, this is the equation: > > T_filter / T_no_filter = 1 / (0.83 * estimated filtering rate + 0.863) > > For example, this means that with an estimated filtering rate of 0.4, > the execution time of merge join is estimated to be improved by 16.3%. > Note that the estimated filtering rate is used in the equation, not > the theoretical filtering rate or the actual filtering rate because it > is what we have during planning. In practice the estimated filtering > rate isn’t usually accurate. In fact, the estimated filtering rate can > differ from the theoretical filtering rate by as much as 17% in our > experiments. One way to mitigate the power loss of bloom filter caused > by inaccurate estimated filtering rate is to adaptively turn it off at > execution time, this is yet to be implemented. > > Here is a list of tasks we plan to work on in order to improve this patch: > 1. More regression testing to guarantee correctness. > 2. More performance testing involving larger tables and complicated query > plans. > 3. Improve the cost model. > 4. Explore runtime tuning such as making the bloom filter checking > adaptive. > 5. Currently, only the best single join key is used for building the > Bloom filter. However, if there are several keys and we know that > their distributions are somewhat disjoint, we could leverage this fact > and use multiple keys for the bloom filter. > 6. Currently, Bloom filter pushdown is only implemented for SeqScan > nodes. However, it would be possible to allow push down to other types > of scan nodes. > 7. Explore if the Bloom filter could be pushed down through a foreign > scan when the foreign server is capable of handling it – which could > be made true for postgres_fdw. > 8. Better explain command on the usage of bloom filters. > > This patch set is prepared by Marcus Ma, Lyu Pan and myself. Feedback > is appreciated. > > With Regards, > Zheng Li > Amazon RDS/Aurora for PostgreSQL > > [1] > https://www.postgresql.org/message-id/flat/c902844d-837f-5f63-ced3-9f7fd222f175%402ndquadrant.com Hi, In the header of patch 1: In this prototype, the cost model is based on an assumption that there is a linear relationship between the performance gain from using a semijoin filter and the estimated filtering rate: % improvement to Merge Join cost = 0.83 * estimated filtering rate - 0.137. How were the coefficients (0.83 and 0.137) determined ? I guess they were based on the results of running certain workload. Cheers