On Thu, Aug 3, 2023 at 12:47 PM Alena Rybakina <lena.riback...@yandex.ru> wrote: > It's all right. I understand your position)
Okay, good to know. :-) > I also agree to try to find other optimization cases and generalize them. Good idea. Since the real goal is to "get a working flow of information", the practical value of trying to get it working with other clauses seems to be of secondary importance. > To be honest, I tried to fix it many times by calling the function to > calculate selectivity, and each time the result of the estimate did not > change. I didn't have any problems in this part after moving the > transformation to the parsing stage. I even tried to perform this > transformation at the planning stage (to the preprocess_qual_conditions > function), but I ran into the same problem there as well. > > To tell the truth, I think I'm ready to investigate this problem again > (maybe I'll be able to see it differently or really find that I missed > something in previous times). The optimizer will itself do a limited form of "normalizing to CNF". Are you familiar with extract_restriction_or_clauses(), from orclauses.c? Comments above the function have an example of how this can work: * Although a join clause must reference multiple relations overall, * an OR of ANDs clause might contain sub-clauses that reference just one * relation and can be used to build a restriction clause for that rel. * For example consider * WHERE ((a.x = 42 AND b.y = 43) OR (a.x = 44 AND b.z = 45)); * We can transform this into * WHERE ((a.x = 42 AND b.y = 43) OR (a.x = 44 AND b.z = 45)) * AND (a.x = 42 OR a.x = 44) * AND (b.y = 43 OR b.z = 45); * which allows the latter clauses to be applied during the scans of a and b, * perhaps as index qualifications, and in any case reducing the number of * rows arriving at the join. In essence this is a partial transformation to * CNF (AND of ORs format). It is not complete, however, because we do not * unravel the original OR --- doing so would usually bloat the qualification * expression to little gain. Of course this immediately makes me wonder: shouldn't your patch be able to perform an additional transformation here? You know, by transforming "a.x = 42 OR a.x = 44" into "a IN (42, 44)"? Although I haven't checked for myself, I assume that this doesn't happen right now, since your patch currently performs all of its transformations during parsing. I also noticed that the same comment block goes on to say something about "clauselist_selectivity's inability to recognize redundant conditions". Perhaps that is relevant to the problems you were having with selectivity estimation, back when the code was in preprocess_qual_conditions() instead? I have no reason to believe that there should be any redundancy left behind by your transformation, so this is just one possibility to consider. Separately, the commit message of commit 25a9e54d2d says something about how the planner builds RestrictInfos, which seems possibly-relevant. That commit enhanced extended statistics for OR clauses, so the relevant paragraph describes a limitation of extended statistics with OR clauses specifically. I'm just guessing, but it still seems like it might be relevant to the problem you ran into with selectivity estimation. Another possibility to consider. BTW, I sometimes use RR to help improve my understanding of the planner: https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Recording_Postgres_using_rr_Record_and_Replay_Framework The planner has particularly complicated control flow, which has unique challenges -- just knowing where to begin can be difficult (unlike most other areas). I find that setting watchpoints to see when and where the planner modifies state using RR is far more useful than it would be with regular GDB. Once I record a query, I find that I can "map out" what happens in the planner relatively easily. -- Peter Geoghegan