On Mon, Sep 23, 2024 at 7:11 AM Alexander Korotkov <aekorot...@gmail.com> wrote: > Makes sense. Please, check the attached patch freeing the consts list > while returning NULL from match_orclause_to_indexcol().
Some review comments: I agree with the comments already given to the effect that the patch looks much better now. I was initially surprised to see this happening in match_clause_to_indexcol() but after studying it I think it looks like the right place. I think it makes sense to think about moving forward with this, although it would be nice to get Tom's take if we can. I see that the patch makes no update to the header comment for match_clause_to_indexcol() nor to the comment just above the cascade of if-statements. I think both need to be updated. More generally, many of the comments in this patch seem to just explain what the code does, and I'd like to reiterate my usual complaint: as far as possible, comments should explain WHY the code does what it does. Certainly, in some cases there's nothing to be said about that e.g. /* Lookup for operator to fetch necessary information for the SAOP node */ isn't really saying anything non-obvious but it's reasonable to have the comment here anyway. However, when there is something more interesting to be said, then we should do that rather than just reiterate what the reader who knows C can anyway see. For instance, the lengthy comment beginning with "Iterate over OR entries." could either be shorter and recapitulate less of the code that follows, or it could say something more interesting about why we're doing it like that. + /* We allow constant to be Const or Param */ + if (!IsA(constExpr, Const) && !IsA(constExpr, Param)) + break; This restriction is a lot tighter than the one mentioned in the header comment of match_clause_to_indexcol ("Our definition of const is exceedingly liberal"). If there's a reason for that, the comments should talk about it. If there isn't, it's better to be consistent. + /* + * Check operator is present in the opfamily, expression collation + * matches index collation. Also, there must be an array type in + * order to construct an array later. + */ + if (!IndexCollMatchesExprColl(index->indexcollations[indexcol], inputcollid) || + !op_in_opfamily(matchOpno, index->opfamily[indexcol]) || + !OidIsValid(arraytype)) + break; I spent some time wondering whether this was safe. The IndexCollMatchesExprColl() guarantees that either the input collation is equal to the index collation, or the index collation is 0. If the index collation is 0 then that I *think* that guarantees that the indexed type is non-collatable, but this could be a cross-type comparison, and it's possible that the other type is collatable. In that case, I don't think anything would prevent us from merging a bunch of OR clauses with different collations into a single SAOP. I don't really see how that could be a problem, because if the index is of a non-collatable type, then presumably the operator doesn't care about what the collation is, so it should all be fine, I guess? But I'm not very confident about that conclusion. I'm unclear what the current thinking is about the performance of this patch, both as to planning and as to execution. Do we believe that this transformation is a categorical win at execution-time? In theory, OR format alllows for short-circuit execution, but because of the Const-or-Param restriction above, I don't think that's mostly a non-issue. But maybe not completely, because I can see from the regression test changes that it's possible for us to apply this transformation when the Param is set by an InitPlan or SubPlan. If we have something like WHERE tenthous = 1 OR tenthous = (very_expensive_computation() + 1), maybe the patch could lose, because we'll have to do the very expensive calculation to evaluate the SAOP, and the OR could stop as soon as we establish that tenthous != 1. If we only did the transformation when the Param is an external parameter, then we wouldn't have this issue. Maybe this isn't worth worrying about; I'm not sure. Are there any other cases where the transformation can produce something that executes more slowly? As far as planning time is concerned, I don't think this is going to be too bad, because most of the work only needs to be done if there are OR-clauses, and my intuition is that the optimization will often apply in such cases, so it seems alright. But I wonder how much testing has been done of adversarial cases, e.g. lots of non-indexable clause in the query; or lots of OR clauses in the query but all of them turn out on inspection to be non-indexable. My expectation would be that there's no real problem here, but it would be good to verify that experimentally. -- Robert Haas EDB: http://www.enterprisedb.com