On 28 April 2014 15:01, Robert Haas <robertmh...@gmail.com> wrote: > Interesting. This can pretty obviously produce a big win if things > break our way. But it'll take some effort do determine whether the > range of possible values for the join column is narrow enough to make > the inferred BETWEEN clause selective enough to be useful, and that > effort will be wasted if it turns out that the answer is no. I can > certainly think of times when this would have helped me, so I kind of > like the idea in theory ... but I won't be surprised if it turns out > to be possible to construct realistic test cases where trying to > figure this out causes too much increase in planning time. > > Actually, it seems possible to me that this could end up winning even > if it doesn't enable the use of an index scan rather than a sequential > scan, because eliminating rows during the scan is typically much > cheaper than eliminating them during the join step. But it also seems > possible that it could lose heavily in that case, if the extra steps > during the scan phase don't eliminate enough rows to reduce the join > cost to a sufficient degree. I'm not really sure how it will work out > on balance.
I agree its not always a win and that the planning overhead is a little high. That's why I suggest we only attempt to calculate such a plan if the normal plan for the scan is already high. That way the extra planning is worth it. I'm sure there are other optimizations that we might only wish to consider for plans that are otherwise high cost. This need hardly touch the main planning path for shirt queries at all. The extra run-time cost of the test adds 2*cpu_operator_cost to the scan. However, it will save at least (OldSelectivity - NewSelectivity)*(cpu_tuple_cost + cpu_operator_cost) in the join node directly above it (and possible more above that - though without being able to inspect the join tree we won't know that until later - and the output plan might actually chage the later join plan anyway). So we can calculate the threshold at which we should choose the modified plan, though it will likely be a conservative value. The new selectivity can be calculated just as other selectivities, I see no problem there. So we have a proposed way to limit planning time in cases where the benefit would be low. Plus we have a way to estimate the reduction in cost once we add the new tests. The real value is whether we can use the additional tests to pick up an index we hadn't been able to use before. Given that MinMax index use will be a win even for fairly high selectivities (because of the low cost of index access), it looks like this optimization will play very nicely with MinMax. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers