Hi Sergei, Thanks again for your comments.
I have added a comment on the jira ticket including the link to a patch that addresses your comments. I will take one more look at it tomorrow morning before sending it to you for review, but I thought maybe it makes sense to respond now (see below) given the timezone difference. > Hi Yuchen, > > Please find below review input for > > > commit ba9cf8efeb0b1e1c132d565adb79c98156783f15 > > Author: Yuchen Pei <yuchen....@mariadb.com> > > Date: Fri May 19 20:06:31 2023 +1000 > > > > MDEV-22534 Decorrelate IN subquery > > > > First, general questions: > > Q1: It seems there is some overlap between > Item_in_subselect::exists2in_processor() and > Item_exists_subselect::exists2in_processor() Did you consider factoring > out common code rather than copying it? (if yes, I'd like to know why > it couldn't be done). Done. I factored out the common code into three methods: - Item_exists_subselect::exists2in_prepare() - Item_exists_subselect::exists2in_create_or_update_in() - Item_exists_subselect::exists2in_and_is_not_nulls() In doing so I tried to not alter the logic of the exists2in transformation, while keeping the decorrelate-in transformation simple. For things done in exists2in but not in my decorrelate-in implementation, I add a substype() check. > Q2: Where is the coe that removes the correlated equalities from the > subquery's WHERE? (I assume Item_exists_subselect code does it?) Naively, the best place to remove the equalities would be in find_inner_outer_equalities(), when it iterates over the conds and one could simply call the remove() method on the iterator. But we can't just do that because after the call to find_inner_outer_equalities() we need to do more checks, as well as replacing the equalities with `IS NOT NULL`s in case of upper_not to ensure the 3vl value of the subquery remains the same (see my previous response about three value logic). > > diff --git a/sql/item_subselect.cc b/sql/item_subselect.cc > > index 9e6c205ca76..8d893794da3 100644 > > --- a/sql/item_subselect.cc > > +++ b/sql/item_subselect.cc > > @@ -3099,6 +3099,132 @@ static bool find_inner_outer_equalities(Item > > **conds, > > return TRUE; > > } > > > > +/* Checks whether item tree intersects with the free list */ > > +static bool intersects_free_list(Item *item, THD *thd) > > +{ > > + for (const Item *to_find= thd->free_list; to_find; to_find= > > to_find->next) > > + if (item->walk(&Item::find_item_processor, 1, (void *) to_find)) > > + return true; > > + return false; > > +} > > + > > Please try moving this function into a separate file. First name that came to > my mind: opt_subq_decorrelate.cc Feel free to suggest better names. The idea > is that we should aim for better structure. > (This request is only valid if the factoring out suggested above doesn't > work out). Replied previously. BTW why should this function be in a separate file? Is it because it is not a method of any of the Item_subselect classes? > > +/* De-correlate where conditions in an IN subquery */ > > This needs a bigger comment describing what the rewrite does. Done and replied previously > > +bool Item_in_subselect::exists2in_processor(void *opt_arg) > > +{ > > + THD *thd= (THD *)opt_arg; > > + SELECT_LEX *first_select= unit->first_select(); > > + JOIN *join= first_select->join; > > + bool will_be_correlated; > > + Dynamic_array<EQ_FIELD_OUTER> eqs(PSI_INSTRUMENT_MEM, 5, 5); > > + List<Item> outer; > > + Query_arena *arena= NULL, backup; > > + int res= FALSE; > > + DBUG_ENTER("Item_in_subselect::exists2in_processor"); > > + > > Please move comments out of the if statement and number the conditions. > > For examples, see best_access_path(), large if-statement starting with > "Don't test table scan if it can't be better." > or > "EXISTS-to-IN coversion and ORDER BY ... LIMIT clause" > in Item_exists_subselect::exists2in_processor(). Done and replied previously > > + if (!optimizer_flag(thd, OPTIMIZER_SWITCH_EXISTS_TO_IN) || > > + /* proceed only if I'm a toplevel IN or a toplevel NOT IN */ > > + !(is_top_level_item() || > > + (upper_not && upper_not->is_top_level_item())) || > > what is the reason behind this condition? > If we're able to handle top-level NOT IN, why can't we handle subquery in any > context? > > Can we really handle the NOT IN? (I suspect not, it will suffer from the > semantics of IN subqueries with NULLs.. > I can't find the name for this, it's described e.g. here: > https://petrunia.net/2006/11/16/inany-subqueries-null-woes/ ) Done and replied previously > > > + first_select->is_part_of_union() || /* skip if part of a union */ > > + first_select->group_list.elements || /* skip if with group by */ > > + first_select->with_sum_func || /* skip if aggregation */ > > + join->having || /* skip if with having */ > > + !join->conds || /* skip if no conds */ > > + /* skip if left expr is a single nonscalar subselect */ > > + (left_expr->type() == Item::SUBSELECT_ITEM && > > + !left_expr->type_handler()->is_scalar_type())) > > The above seems to be some exotic SQL. Is it something like > > (select col1,lol2 from t1) IN (select col3,col4 from t2 where ..) > > and does MariaDB really support this? Please add a comment about > this. Done and replied previously > > > + DBUG_RETURN(FALSE); > > + /* iterate over conditions, and check whether they can be moved out. */ > > + if (find_inner_outer_equalities(&join->conds, eqs)) > > + DBUG_RETURN(FALSE); > > + /* If we are in the execution of a prepared statement, check for > > + intersection with the free list to avoid segfault. Note that the > > + check for prepared statement execution is necessary, otherwise it > > + will likely always find intersection thus abort the > > + transformation. > > + > > + fixme: this only works when HAVE_PSI_STATEMENT_INTERFACE is > > + defined. It causes CI's like amd64-debian-10-debug-embedded to > > + fail. Are there other ways to find out we are in the execution of a > > + prepared statement? */ > > I'm looking at the code of activate_stmt_arena_if_needed(). It seems > "stmt_arena->is_conventional()" is the check you're looking for. Done and replied previously > > > + if (thd->m_statement_state.m_parent_prepared_stmt) > > + { > > + for (uint i= 0; i < (uint) eqs.elements(); i++) > > + { > > + if (intersects_free_list(*eqs.at(i).eq_ref, thd)) > > + DBUG_RETURN(FALSE); > > This check doesn't look good. > As far as I understand, it is a temporary measure until related > re-execution bug(s) are fixed by Igor (and/or Sanja)? Replied previously > > + } > > + } > > > + /* Determines whether the result will be correlated */ > > + { > > + List<Item> unused; > > + Collect_deps_prm prm= {&unused, // parameters > > + unit->first_select()->nest_level_base, // nest_level_base > > + 0, // count > > + unit->first_select()->nest_level, // nest_level > > + FALSE // collect > > + }; > > + walk(&Item::collect_outer_ref_processor, TRUE, &prm); > > + DBUG_ASSERT(prm.count > 0); > > + DBUG_ASSERT(prm.count >= (uint)eqs.elements()); > > + will_be_correlated= prm.count > (uint)eqs.elements(); > See the example pasted in the MDEV: > https://jira.mariadb.org/browse/MDEV-22534?focusedCommentId=259705&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-259705 > > Somehow, I get here will_be_correlated==FALSE but Materialization is still not > considered for the subquery? Done and replied in the ticket > > > + } > > + /* Back up the free list */ > > + arena= thd->activate_stmt_arena_if_needed(&backup); > > + /* Construct the items for left_expr */ > > + if (left_expr->type() == Item::ROW_ITEM) > > + for (uint i= 0; i < left_expr->cols(); i++) > > + outer.push_back(left_expr->element_index(i)); > > + else > > + outer.push_back(left_expr); > > + const uint offset= first_select->item_list.elements; > > + /* Move items to outer and select item list */ > > + for (uint i= 0; i < (uint)eqs.elements(); i++) > > + { > > + Item **eq_ref= eqs.at(i).eq_ref; > > + Item_ident *local_field= eqs.at(i).local_field; > > + Item *outer_exp= eqs.at(i).outer_exp; > > + first_select->item_list.push_back(local_field, thd->mem_root); > > + first_select->ref_pointer_array[offset + i]= (Item *)local_field; > > How do we know that ref_pointer_array has enough room for the > new elements? > (We can discuss this with Sanja, he's an expert on the topic). > > I see this check in Item_exists_subselect::exists2in_processor > > if ((uint)eqs.elements() > (first_select->item_list.elements + > first_select->select_n_reserved)) > > Is it how Item_exists_subselect handles this issue? Yes, I think so, done Best, Yuchen _______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp