On 10/23/24 15:05, Robert Haas wrote:
On Sat, Oct 19, 2024 at 6:00 AM Andrei Lepikhov <lepi...@gmail.com> wrote:
Generally, a hash value doesn't 100% guarantee the uniqueness of a node
identification. Also, RelOptInfo corresponds to a subtree in the final
plan, and sometimes, it takes work to find which node in the partially
executed plan corresponds to this specific estimation on row number
during selectivity estimation. Remember parameterised paths - you should
attach some signature for each path. So, it is not fully strict method.
If you are interested, I can perhaps explain the method a little bit
more at some meetup.
Yeah, I agree that this is not the best method. While it's true that
you could get a false match in case of a hash value collision, IMHO
the bigger problem is that it seems like an expensive way of
determining something that we really should know already. If the user
types the same query, mentioning the same relations, in the same
order, with the same constructs around them, it's hard to believe that
hashing is the cheapest way of matching up the old and new ones. I'm
not sure exactly what we should do instead, but it feels like we more
or less have this information during parsing and then we lose track of
it as the query goes through the rewrite and planning phases.
Parse tree may be implemented with multiple execution plans. Even
clauses can be transformed during optimisation (Remember OR -> ANY).
Also, the cardinality of a middle-tree join depends on the inner and
outer subtrees. Because of that, having a hash on RelOptInfo's relids
and restrictions + hashes of child RelOptInfos and carrying it through
all other stages up to the end of execution is the most stable approach
I know.
--
regards, Andrei Lepikhov