On Sat, Dec 11, 2021 at 7:31 AM 曾文旌(义从) <wenjing....@alibaba-inc.com> wrote:
> > > ------------------原始邮件 ------------------ > *发件人:*Tomas Vondra <tomas.von...@enterprisedb.com> > *发送时间:*Wed Dec 8 11:26:35 2021 > *收件人:*曾文旌(义从) <wenjing....@alibaba-inc.com>, shawn wang < > shawn.wang...@gmail.com>, ggys...@gmail.com <ggys...@gmail.com>, > PostgreSQL Hackers <pgsql-hack...@postgresql.org> > *抄送:*wjzeng <wjzeng2...@gmail.com> > *主题:*Re: 回复:Re: Is it worth pushing conditions to sublink/subplan? > >> Hi, >> >> On 12/7/21 10:44, 曾文旌(义从) wrote: >> > Hi Hackers >> > >> > For my previous proposal, I developed a prototype and passed >> > regression testing. It works similarly to subquery's qual pushdown. >> > We know that sublink expands at the beginning of each level of >> > query. At this stage, The query's conditions and equivalence classes >> > are not processed. But after generate_base_implied_equalities the >> > conditions are processed, which is why qual can push down to >> > subquery but sublink not. >> > >> > My POC implementation chose to delay the sublink expansion in the >> > SELECT clause (targetList) and where clause. Specifically, it is >> > delayed after generate_base_implied_equalities. Thus, the equivalent >> > conditions already established in the Up level query can be easily >> > obtained in the sublink expansion process (make_subplan). >> > >> > For example, if the up level query has a.id = 10 and the sublink >> > query has a.id = b.id, then we get b.id = 10 and push it down to the >> > sublink quey. If b is a partitioned table and is partitioned by id, >> > then a large number of unrelated subpartitions are pruned out, This >> > optimizes a significant amount of Planner and SQL execution time, >> > especially if the partitioned table has a large number of >> > subpartitions and is what I want. >> > >> > Currently, There were two SQL failures in the regression test, >> > because the expansion order of sublink was changed, which did not >> > affect the execution result of SQL. >> > >> > Look forward to your suggestions on this proposal. >> > >> >> I took a quick look, and while I don't see / can't think of any problems >> with delaying it until after generating implied equalities, there seems >> to be a number of gaps. >> >> *Thank you for your attention.* >> >> 1) Are there any regression tests exercising this modified behavior? >> Maybe there are, but if the only changes are due to change in order of >> targetlist entries, that doesn't seem like a clear proof. >> >> It'd be good to add a couple tests exercising both the positive and >> negative case (i.e. when we can and can't pushdown a qual). >> >> *I added several samples to the regress(qual_pushdown_to_sublink.sql). * >> *and I >> used the partition table to show the plan status of qual being pushed down >> into sublink.* >> >> *Hopefully this will help you understand the details of this patch. Later, I >> will add more cases.* >> >> 2) apparently, contrib/postgres_fdw does crash like this: >> >> #3 0x000000000077b412 in adjust_appendrel_attrs_mutator >> (node=0x13f7ea0, context=0x7fffc3351b30) at appendinfo.c:470 >> 470 Assert(!IsA(node, SubLink)); >> (gdb) p node >> $1 = (Node *) 0x13f7ea0 >> (gdb) p *node >> $2 = {type = T_SubLink} >> >> Backtrace attached. >> >> *For the patch attached in the last email, I passed all the tests under >> src/test/regress.* >> *As you pointed out, there was a problem with regression under contrib(in >> contrib/postgres_fdw). * >> *This time I fixed it and the current patch (V2) can pass the >> check-world.* >> >> 3) various parts of the patch really need at least some comments, like: >> >> - try_push_outer_qual_to_sublink_query really needs some docs >> >> - new stuff at the end of initsplan.c >> >> *Ok, I added some comments and will >> add more. If you have questions about any details,* >> *please point them out directly.* >> >> 4) generate_base_implied_equalities >> >> shouldn't this >> >> if (ec->ec_processed) >> ; >> >> really be? >> >> if (ec->ec_processed) >> continue; >> >> *You are right. I fixed it.* >> >> 5) I'm not sure why we need the new ec_processed flag. >> >> >> *I did this to eliminate duplicate equalities from the two >> generate_base_implied_equalities calls* >> >> *1) I need the base equivalent expression generated after >> generate_base_implied_equalities,* >> *which is used to pushdown qual to sublink(lazy_process_sublinks)* >> >> *2) The expansion of sublink may result in an equivalent expression with >> parameters, such as a = $1,* >> *which needs to deal with the equivalence classes again.* >> >> *3) So, I added ec_processed and asked to process it again >> (generate_base_implied_equalities)* >> *after the equivalence class changed (add_eq_member/process_equivalence).* >> >> *Maybe you have a better suggestion, please let me know.* >> >> 6) So we now have lazy_process_sublink callback? Does that mean we >> expand sublinks in two places - sometimes lazily, sometimes not? >> >> *Yes, not all sublink is delayed. Let me explain this:* >> *1) I added a GUC switch enable_lazy_process_sublink. If it is turned off, >> all >> lazy process sublink will not happen,* >> >> *qual pushdown to sublink depend on lazy procee sublink, which means no >> quals will be pushed down.* >> *2) Even if enable_lazy_process_sublink = true >> If Query in this level contains some complex features,* >> *sublink in this level query will not try do qual pushdown. (see function >> query_has_sublink_try_pushdown_qual).* >> >> *I want to support a minimum subset first. Then consider complex features >> such as CTE/DML.* >> *3) Finally, under conditions 1 and 2, >> all kinds of sublink contained in the SELECT clause or* >> *WHERE clause will delays expansion and try pushdown qual. The >> sublink elsewhere in the SQL statement* >> *does not delay process.* >> >> *The current status meets my requirements for >> now. Of course, after this scheme is proved to be feasible, maybe* >> *we can discuss that all sublinks are processed by overall delay, just like >> qual pushdown to subquery.* >> >> *thanks* >> >> *Wenjing* >> >> >> >> regards >> >> -- >> Tomas Vondra >> EnterpriseDB: http://www.enterprisedb.com >> The Enterprise PostgreSQL Company >> > Hi, + /* The outer var could exist in any of the upper-level queries to find these roots */ to find these roots -> so find these roots + if (has_unexpand_sublink(root) && checkExprHasSubLink(node)) has_unexpand_sublink -> has_unexpanded_sublink + if (enable_lazy_process_sublink) + return true; The above can be simplified to: return enable_lazy_process_sublink; + if (checkExprHasSubLink(qual)) + { + qual = lazy_process_sublink_qual(root, qual); + newquals = lappend(newquals, qual); + } + else + newquals = lappend(newquals, qual); Since the lappend() is common to both branches, you can remove the else clause. In the if block, only call lazy_process_sublink_qual(). + /* under lazy process sublink, parent root may have some data that child not need, so set it to NULL */ + subroot->join_info_list = NIL; minor correction to the comment above: under lazy process sublink, parent root may have some data that child does not need, so set it to NIL +void +preprocess_qual_conditions(PlannerInfo *root, Node *jtnode, bool istop) Please add a comment explaining the meaning of istop. + if (istop) + f->quals = preprocess_expression_ext(root, f->quals, EXPRKIND_QUAL, false); + else + f->quals = preprocess_expression(root, f->quals, EXPRKIND_QUAL); I think the code would be more readable if you replace the preprocess_expression() call in else branch with call to preprocess_expression_ext(). + context->root->unexpand_sublink_counter++; unexpand_sublink_counter -> unexpanded_sublink_counter++ For sublink_query_push_qual(), the return at the end is not needed. For condition_is_safe_pushdown_to_sublink, you can initialize context this way : + equal_expr_info_context context = {0}; + if (cvar && cvar->varattno > 0 && equal(cvar, var)) + return true; The last few lines of condition_is_safe_pushdown_to_sublink() can be written as: return cvar && cvar->varattno > 0 && equal(cvar, var); + if (equal_expr_safety_check(node, &context)) + { + /* It needs to be something like outer var = inner var */ + if (context.inner_var && The nested if blocks can be merged into one if block. Cheers