Re: Bloom filter Pushdown Optimization for Merge Join

Zhihong Yu Fri, 30 Sep 2022 21:21:46 -0700

On Fri, Sep 30, 2022 at 8:40 PM Zhihong Yu <z...@yugabyte.com> wrote:


>
>
> On Fri, Sep 30, 2022 at 3:44 PM Zheng Li <zhengl...@gmail.com> wrote:
>
>> Hello,
>>
>> A bloom filter provides early filtering of rows that cannot be joined
>> before they would reach the join operator, the optimization is also
>> called a semi join filter (SJF) pushdown. Such a filter can be created
>> when one child of the join operator must materialize its derived table
>> before the other child is evaluated.
>>
>> For example, a bloom filter can be created using the the join keys for
>> the build side/inner side of a hash join or the outer side of a merge
>> join, the bloom filter can then be used to pre-filter rows on the
>> other side of the join operator during the scan of the base relation.
>> The thread about “Hash Joins vs. Bloom Filters / take 2” [1] is good
>> discussion on using such optimization for hash join without going into
>> the pushdown of the filter where its performance gain could be further
>> increased.
>>
>> We worked on prototyping bloom filter pushdown for both hash join and
>> merge join. Attached is a patch set for bloom filter pushdown for
>> merge join. We also plan to send the patch for hash join once we have
>> it rebased.
>>
>> Here is a summary of the patch set:
>> 1. Bloom Filter Pushdown optimizes Merge Join by filtering rows early
>> during the table scan instead of later on.
>>         -The bloom filter is pushed down along the execution tree to
>> the target SeqScan nodes.
>>         -Experiments show that this optimization can speed up Merge
>> Join by up to 36%.
>>
>> 2. The planner makes the decision to use the bloom filter based on the
>> estimated filtering rate and the expected performance gain.
>>         -The planner accomplishes this by estimating four numbers per
>> variable - the total number of rows of the relation, the number of
>> distinct values for a given variable, and the minimum and maximum
>> value of the variable (when applicable). Using these numbers, the
>> planner estimates a filtering rate of a potential filter.
>>         -Because actually creating and implementing the filter adds
>> more operations, there is a minimum threshold of filtering where the
>> filter would actually be useful. Based on testing, we query to see if
>> the estimated filtering rate is higher than 35%, and that informs our
>> decision to use a filter or not.
>>
>> 3. If using a bloom filter, the planner also adjusts the expected cost
>> of Merge Join based on expected performance gain.
>>
>> 4. Capability to build the bloom filter in parallel in case of
>> parallel SeqScan. This is done efficiently by populating a local bloom
>> filter for each parallel worker and then taking a bitwise OR over all
>> the local bloom filters to form a shared bloom filter at the end of
>> the parallel SeqScan.
>>
>> 5. The optimization is GUC controlled, with settings of
>> enable_mergejoin_semijoin_filter and force_mergejoin_semijoin_filter.
>>
>> We found in experiments that there is a significant improvement
>> when using the bloom filter during Merge Join. One experiment involved
>> joining two large tables while varying the theoretical filtering rate
>> (TFR) between the two tables, the TFR is defined as the percentage
>> that the two datasets are disjoint. Both tables in the merge join were
>> the same size. We tested changing the TFR to see the change in
>> filtering optimization.
>>
>> For example, let’s imagine t0 has 10 million rows, which contain the
>> numbers 1 through 10 million randomly shuffled. Also, t1 has the
>> numbers 4 million through 14 million randomly shuffled. Then the TFR
>> for a join of these two tables is 40%, since 40% of the tables are
>> disjoint from the other table (1 through 4 million for t0, 10 million
>> through 14 million for t4).
>>
>> Here is the performance test result joining two tables:
>> TFR: theoretical filtering rate
>> EFR: estimated filtering rate
>> AFR: actual filtering rate
>> HJ: hash join
>> MJ Default: default merge join
>> MJ Filter: merge join with bloom filter optimization enabled
>> MJ Filter Forced: merge join with bloom filter optimization forced
>>
>> TFR   EFR   AFR   HJ   MJ Default   MJ Filter   MJ Filter Forced
>>
>> -------------------------------------------------------------------------------------
>> 10     33.46   7.41    6529   22638        21949            23160
>> 20     37.27  14.85   6483   22290        21928            21930
>> 30     41.32   22.25  6395   22374        20718            20794
>> 40     45.67   29.7    6272   21969        19449            19410
>> 50     50.41   37.1    6210   21412        18222            18224
>> 60     55.64   44.51  6052   21108        17060            17018
>> 70     61.59   51.98  5947   21020        15682            15737
>> 80     68.64   59.36  5761   20812        14411            14437
>> 90     77.83   66.86  5701   20585        13171            13200
>> Table. Execution Time (ms) vs Filtering Rate (%) for Joining Two
>> Tables of 10M Rows.
>>
>> Attached you can find figures of the same performance test and a SQL
>> script
>> to reproduce the performance test.
>>
>> The first thing to notice is that Hash Join generally is the most
>> efficient join strategy. This is because Hash Join is better at
>> dealing with small tables, and our size of 10 million is still small
>> enough where Hash Join outperforms the other join strategies. Future
>> experiments can investigate using much larger tables.
>>
>> However, comparing just within the different Merge Join variants, we
>> see that using the bloom filter greatly improves performance.
>> Intuitively, all of these execution times follow linear paths.
>> Comparing forced filtering versus default, we can see that the default
>> Merge Join outperforms Merge Join with filtering at low filter rates,
>> but after about 20% TFR, the Merge Join with filtering outperforms
>> default Merge Join. This makes intuitive sense, as there are some
>> fixed costs associated with building and checking with the bloom
>> filter. In the worst case, at only 10% TFR, the bloom filter makes
>> Merge Join less than 5% slower. However, in the best case, at 90% TFR,
>> the bloom filter improves Merge Join by 36%.
>>
>> Based on the results of the above experiments, we came up with a
>> linear equation for the performance ratio for using the filter
>> pushdown from the actual filtering rate. Based on the numbers
>> presented in the figure, this is the equation:
>>
>> T_filter / T_no_filter = 1 / (0.83 * estimated filtering rate + 0.863)
>>
>> For example, this means that with an estimated filtering rate of 0.4,
>> the execution time of merge join is estimated to be improved by 16.3%.
>> Note that the estimated filtering rate is used in the equation, not
>> the theoretical filtering rate or the actual filtering rate because it
>> is what we have during planning. In practice the estimated filtering
>> rate isn’t usually accurate. In fact, the estimated filtering rate can
>> differ from the theoretical filtering rate by as much as 17% in our
>> experiments. One way to mitigate the power loss of bloom filter caused
>> by inaccurate estimated filtering rate is to adaptively turn it off at
>> execution time, this is yet to be implemented.
>>
>> Here is a list of tasks we plan to work on in order to improve this patch:
>> 1. More regression testing to guarantee correctness.
>> 2. More performance testing involving larger tables and complicated query
>> plans.
>> 3. Improve the cost model.
>> 4. Explore runtime tuning such as making the bloom filter checking
>> adaptive.
>> 5. Currently, only the best single join key is used for building the
>> Bloom filter. However, if there are several keys and we know that
>> their distributions are somewhat disjoint, we could leverage this fact
>> and use multiple keys for the bloom filter.
>> 6. Currently, Bloom filter pushdown is only implemented for SeqScan
>> nodes. However, it would be possible to allow push down to other types
>> of scan nodes.
>> 7. Explore if the Bloom filter could be pushed down through a foreign
>> scan when the foreign server is capable of handling it – which could
>> be made true for postgres_fdw.
>> 8. Better explain command on the usage of bloom filters.
>>
>> This patch set is prepared by Marcus Ma, Lyu Pan and myself. Feedback
>> is appreciated.
>>
>> With Regards,
>> Zheng Li
>> Amazon RDS/Aurora for PostgreSQL
>>
>> [1]
>> https://www.postgresql.org/message-id/flat/c902844d-837f-5f63-ced3-9f7fd222f175%402ndquadrant.com
>
>
> Hi,
> In the header of patch 1:
>
> In this prototype, the cost model is based on an assumption that there is
> a linear relationship between the performance gain from using a semijoin
> filter and the estimated filtering rate:
> % improvement to Merge Join cost = 0.83 * estimated filtering rate - 0.137.
>
> How were the coefficients (0.83 and 0.137) determined ?
> I guess they were based on the results of running certain workload.
>
> Cheers
>
Hi,
For patch 1:

+bool       enable_mergejoin_semijoin_filter;
+bool       force_mergejoin_semijoin_filter;

How would (enable_mergejoin_semijoin_filter = off,
force_mergejoin_semijoin_filter = on) be interpreted ?
Have you considered using one GUC which has three values: off, enabled,
forced ?

+       mergeclauses_for_sjf = get_actual_clauses(path->path_mergeclauses);
+       mergeclauses_for_sjf = get_switched_clauses(path->path_mergeclauses,
+
path->jpath.outerjoinpath->parent->relids);

mergeclauses_for_sjf is assigned twice and I don't see mergeclauses_for_sjf
being reference in the call to get_switched_clauses().
Is this intentional ?

+           /* want at least 1000 rows_filtered to avoid any nasty edge
cases */
+           if (force_mergejoin_semijoin_filter || (filteringRate >= 0.35
&& rows_filtered > 1000))

The above condition is narrower compared to the enclosing condition.
Since there is no else block for the second if block, please merge the two
if statements.

+   int         best_filter_clause;

Normally I would think `clause` is represented by List*. But
best_filter_clause is an int. Please use another variable name so that
there is less chance of confusion.

For evaluate_semijoin_filtering_rate():

+   double      best_sj_selectivity = 1.01;

How was 1.01 determined ?

+   debug_sj1("SJPD:  start evaluate_semijoin_filtering_rate");

There are debug statements in the methods.
It would be better to remove them in the next patch set.

Cheers

Re: Bloom filter Pushdown Optimization for Merge Join

Reply via email to