On Mon, 28 Sep 2020 at 13:22, David Rowley <dgrowle...@gmail.com> wrote: > I'm more leaning towards the fact that the planner picked the seqscan > path in the first place. Unfotunately I don't have any great ideas on > how to fix without fudging the costs a bit. Maybe we can make > var_eq_non_const() divide down the selectivity by: > > ndistinct = Max(get_variable_numdistinct(vardata, &isdefault), > DEFAULT_NUM_DISTINCT); > > instead of : > > ndistinct = get_variable_numdistinct(vardata, &isdefault); > > But that's going to make the row estimate in this case 50 rows (10000 > / 200). With a subplan that the stats say can only either return 10000 > or 0 rows. Is that too weird?
I figured it was easier to have a discussion around a patch, so, for discussion purposes, I wrote one and attached it. The patch is along the lines of what I mentioned above, so aims to be a more generic fix for this problem rather than something that only aims to change EXISTS type subqueries. It affects both of the examples that I mentioned in my previous email. So we now get an index scan in the following case; postgres=# set plan_cache_mode = force_generic_plan; SET postgres=# prepare q1(int) as select b from t where b = $1 limit 1; PREPARE postgres=# explain (analyze, costs off, timing off) execute q1(1); QUERY PLAN ------------------------------------------------------------------ Limit (actual rows=1 loops=1) -> Index Only Scan using t_b_idx on t (actual rows=1 loops=1) Index Cond: (b = $1) Heap Fetches: 0 Planning Time: 0.166 ms Execution Time: 1.772 ms (6 rows) and we get a hashed subplan in the other case I mentioned. (Which does win out over the Index Scan subplan in fix_alternative_subplan()) While it is great to fix both those cases, this patch could have repercussions later in planning. The selectivity estimate for these parameter equality scans is now going to be at most 1.0 / 200.0, where before it could have been 1.0 given an ndistinct estimate of 1. There's probably more smarts that could be added, like checking if the parameter is from a Var and if there's a foreign key to verify that we'll never pass in a parameter value that's not in the table. I'm not sure how hard that will be exactly. I see we do at least pass down the PlannerInfo into the operator's oprcode function, which is a good start, but I suspect it'll still be tricky to do this efficiently. Any opinions on this? David
encourage_safer_plans_for_parameter_equality_planing.patch
Description: Binary data