On Thu, Oct 13, 2022 at 7:30 AM Zhihong Yu <z...@yugabyte.com> wrote:
> > > On Wed, Oct 12, 2022 at 4:35 PM Zhihong Yu <z...@yugabyte.com> wrote: > >> >> >> On Wed, Oct 12, 2022 at 3:36 PM Lyu Pan <lyu.steve....@gmail.com> wrote: >> >>> Hello Zhihong Yu & Tomas Vondra, >>> >>> Thank you so much for your review and feedback! >>> >>> We made some updates based on previous feedback and attached the new >>> patch set. Due to time constraints, we didn't get to resolve all the >>> comments, and we'll continue to improve this patch. >>> >>> > In this prototype, the cost model is based on an assumption that there >>> is >>> > a linear relationship between the performance gain from using a >>> semijoin >>> > filter and the estimated filtering rate: >>> > % improvement to Merge Join cost = 0.83 * estimated filtering rate - >>> 0.137. >>> > >>> > How were the coefficients (0.83 and 0.137) determined ? >>> > I guess they were based on the results of running certain workload. >>> >>> Right, the coefficients (0.83 and 0.137) determined are based on some >>> preliminary testings. The current costing model is pretty naive and >>> we'll work on a more robust costing model in future work. >>> >>> >>> > I agree, in principle, although I think the current logic / formula is >>> a >>> > bit too crude and fitted to the simple data used in the test. I think >>> > this needs to be formulated as a regular costing issue, considering >>> > stuff like cost of the hash functions, and so on. >>> > >>> > I think this needs to do two things: >>> > >>> > 1) estimate the cost of building the bloom filter - This shall depend >>> on >>> > the number of rows in the inner relation, number/cost of the hash >>> > functions (which may be higher for some data types), etc. >>> > >>> > 2) estimate improvement for the probing branch - Essentially, we need >>> to >>> > estimate how much we save by filtering some of the rows, but this also >>> > neeeds to include the cost of probing the bloom filter. >>> > >>> > This will probably require some improvements to the lib/bloomfilter, in >>> > order to estimate the false positive rate - this may matter a lot for >>> > large data sets and small work_mem values. The bloomfilter library >>> > simply reduces the size of the bloom filter, which increases the false >>> > positive rate. At some point it'll start reducing the benefit. >>> > >>> >>> These suggestions make a lot of sense. The current costing model is >>> definitely not good enough, and we plan to work on a more robust >>> costing model as we continue to improve the patch. >>> >>> >>> > OK. Could also build the bloom filter in shared memory? >>> > >>> >>> We thought about this approach but didn't prefer this one because if >>> all worker processes share the same bloom filter in shared memory, we >>> need to frequently lock and unlock the bloom filter to avoid race >>> conditions. So we decided to have each worker process create its own >>> bloom filter. >>> >>> >>> > IMHO we shouldn't make too many conclusions from these examples. Yes, >>> it >>> > shows merge join can be improved, but for cases where a hashjoin works >>> > better so we wouldn't use merge join anyway. >>> > >>> > I think we should try constructing examples where either merge join >>> wins >>> > already (and gets further improved by the bloom filter), or would lose >>> > to hash join and the bloom filter improves it enough to win. >>> > >>> > AFAICS that requires a join of two large tables - large enough that >>> hash >>> > join would need to be batched, or pre-sorted inputs (which eliminates >>> > the explicit Sort, which is the main cost in most cases). >>> > >>> > The current patch only works with sequential scans, which eliminates >>> the >>> > second (pre-sorted) option. So let's try the first one - can we invent >>> > an example with a join of two large tables where a merge join would >>> win? >>> > >>> > Can we find such example in existing benchmarks like TPC-H/TPC-DS. >>> > >>> >>> Agreed. The current examples are only intended to show us that using >>> bloom filters in merge join could improve the merge join performance >>> in some cases. We are working on testing more examples that merge join >>> with bloom filter could out-perform hash join, which should be more >>> persuasive. >>> >>> >>> > The bloom filter is built by the first seqscan (on t0), and then used >>> by >>> > the second seqscan (on t1). But this only works because we always run >>> > the t0 scan to completion (because we're feeding it into Sort) before >>> we >>> > start scanning t1. >>> > >>> > But when the scan on t1 switches to an index scan, it's over - we'd be >>> > building the filter without being able to probe it, and when we finish >>> > building it we no longer need it. So this seems pretty futile. >>> > >>> > It might still improve plans like >>> > >>> > -> Merge Join >>> > Merge Cond: (t0.c1 = t1.c1) >>> > SemiJoin Filter Created Based on: (t0.c1 = t1.c1) >>> > SemiJoin Estimated Filtering Rate: 1.0000 >>> > -> Sort >>> > Sort Key: t0.c1 >>> > -> Seq Scan on t0 >>> > -> Index Scan on t1 >>> > >>> > But I don't know how common/likely that actually is. I'd expect to have >>> > an index on both sides, but perhaps I'm wrong. >>> > >>> > This is why hashjoin seems like a more natural fit for the bloom >>> filter, >>> > BTW, because there we have a guarantee the inner relation is processed >>> > first (so we know the bloom filter is fine and can be probed). >>> > >>> >>> Great observation. The bloom filter only works if the first SeqScan >>> always runs to completion before the second SeqScan starts. >>> I guess one possible way to avoid futile bloom filter might be >>> enforcing that the bloom filter only be used if both the outer/inner >>> plans of the MergeJoin are Sort nodes, to guarantee the bloom filter >>> is ready to use after processing one side of the join, but this may be >>> too restrictive. >>> >>> >>> > I don't know what improvements you have in mind exactly, but I think >>> > it'd be good to show which node is building/using a bloom filter, and >>> > then also some basic stats (size, number of hash functions, FPR, number >>> > of probes, ...). This may require improvements to lib/bloomfilter, >>> which >>> > currently does not expose some of the details. >>> > >>> >>> Along with the new patch set, we have added information to display >>> which node is building/using a bloom filter (as well as the >>> corresponding expressions), and some bloom filter basic stats. We'll >>> add more related information (e.g. FPR) as we modify lib/bloomfilter >>> implementation in future work. >>> >>> >>> Thanks again for the great reviews and they are really useful! More >>> feedback is always welcome and appreciated! >>> >>> Regards, >>> Lyu Pan >>> Amazon RDS/Aurora for PostgreSQL >>> >>> Hi, >> For v1-0001-Support-semijoin-filter-in-the-planner-optimizer.patch : >> >> + >> + /* want at least 1000 rows_filtered to avoid any nasty edge cases >> */ >> + if (force_mergejoin_semijoin_filter || >> + (filtering_rate >= 0.35 && rows_filtered > 1000 && >> best_filter_clause >= 0)) >> >> Currently rows_filtered is compared with a constant, should the constant >> be made configurable ? >> >> + * improvement of 19.5%. This equation also concludes thata a >> 17% >> >> Typo: thata >> >> + inner_md_array = palloc(sizeof(SemiJoinFilterExprMetadata) * num_md); >> + if (!outer_md_array || !inner_md_array) >> + { >> + return 0; /* a stack array allocation failed */ >> >> Should the allocated array be freed before returning ? >> >> For verify_valid_pushdown(), the parameters in comment don't match the >> actual parameters. Please modify the comment. >> >> For is_fk_pk(), since the outer_key_list is fixed across the iterations, >> I think all_vars can be computed outside of expressions_match_foreign_key(). >> >> Cheers >> > > Continuing > with v1-0001-Support-semijoin-filter-in-the-planner-optimizer.patch > > For get_switched_clauses(), the returned List contains all the clauses. > Yet the name suggests that only switched clauses are returned. > Please rename the method to adjust_XX or rearrange_YY so that it is less > likely to cause confusion. > > For depth_of_semijoin_target(), idx is only used inside the `if (num_exprs > == get_appendrel_occluded_references(` block, you can move its declaration > inside that if block. > > + outside_subq_rte = root->simple_rte_array[outside_subq_var->varno]; > + > + /* System Vars have varattno < 0, don't bother */ > + if (outside_subq_var->varattno <= 0) > + return 0; > > Since the check on outside_subq_var->varattno doesn't depend on > outside_subq_rte, you can move the assignment to outside_subq_rte after the > if check. > > Cheers > For v1-0002-Support-semijoin-filter-in-the-executor-for-non-para.patch, I have a few comments. + + if (IsA(node, SeqScanState) && ((SeqScanState *) node)->apply_semijoin_filter + && !ExecScanUsingSemiJoinFilter((SeqScanState *) node, econtext)) + { + /* slot did not pass SemiJoinFilter, so skipping it. */ + ResetExprContext(econtext); + continue; + } + return projectedSlot; Since projectedSlot is only returned if slot passes SemiJoinFilter, you can move the call to `ExecProject` after the above if block. + List *semijoin_filters; /* SemiJoinFilterJoinNodeState */ + List *sj_scan_data; /* SemiJoinFilterScanNodeState */ Better use plural form in the comments. Cheers