Thanks for your thoughts. For the case where it isn't known if the case expression itself is indexed, technically that should be added as a decision-node in the query planner. After all there are 2 possibilities to handle that so it should be up to the planner to choose the cheapest.
Having said that, if the time spent planning the query is *that* critical I agree that it probably isn't worth it. Just that in my line of work the execution time of a query is a lot of orders of magnitude larger than the planning time (my recordholder is a query that runs for just over 3 days...) On Fri, Jun 5, 2020 at 4:31 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > David Rowley <dgrowle...@gmail.com> writes: > > On Fri, 5 Jun 2020 at 14:41, Paul van der Linden > > <paul.doskabou...@gmail.com> wrote: > >> If I have a query like: > >> > >> SELECT * FROM ( > >> SELECT > >> CASE > >> WHEN field='value1' THEN 1 > >> WHEN field='value2' THEN 2 > >> END AS category > >> FROM table1 > >> ) AS foo > >> WHERE category=1 > >> > >> doesn't use the index on field, while technically it could do that. > >> Is it hard to implement drilling down the constant in the WHERE to > within the CASE? > > > It doesn't look impossible to improve that particular case. See > > eval_const_expressions_mutator() in clauses.c at T_CaseExpr. However, > > this would need to take constant folding further than we take it > > today. Today we just have the ability to simplify expressions which > > are, by themselves, an expression which will always evaluate to a > > constant value. This case is more complex as it requires something > > outside the CASE expr to allow the simplification to take place. In > > this case, we'd need to look at the other side of the OpExpr to see > > the const there before any transformation could simplify it. > > I'd tend to see this as a transformation rule that acts on equality- > with-a-CASE-input, thereby avoiding the "action at a distance" problem. > > > It's > > also not entirely clear that the simplification would always be a good > > idea. What, for example if there was an index on the case statement > > but none on "field". The query may perform worse! > > FWIW, I'm not too fussed about that objection. If we rejected new > optimizations on the basis that somebody's optimized-for-the-old-way > query might perform worse, almost no planner changes would ever get in. > I think most people would feel that an optimization like this is an > improvement. (I recall coming across a similar case in an > information_schema query just a few days ago.) The hard questions > I would ask are > 1. Is the transformation actually correct? > 2. Does it improve queries often enough to be worth the planning cycles > expended to look for the optimization? > > As far as #1 goes, note that this CASE produces NULL if "field" is > neither 'value1' nor 'value2', whereupon the equality operator would > also produce NULL, so that simplifying to "field='value1'" is not > formally correct: that would produce FALSE not NULL for other values > of "field". We can get away with the replacement anyway at the top > level of WHERE, but not in other contexts. Hence, it'd be wrong to > try to make this transformation in eval_const_expressions(), which is > applied to all expressions. Possibly prepqual.c's canonicalize_qual() > would be a better place. > > The real problem here is going to be objection #2. The rules under > which any optimization could be applied are nontrivial, so that we'd > spend quite a bit of time trying to figure out whether the optimization > applies ... and I'm afraid that most of the time it would not. > > regards, tom lane >