Re: adjust_limit_rows_costs algorithm

Heikki Linnakangas Wed, 25 Dec 2024 08:11:58 -0800

On 24/12/2024 08:20, wenhui qiu wrote:

sysbench=# explain analyze select userid from dba_users where usernamelike '%aaaaaaaaaaaa%' order by userid limit 1;
                                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..408.59 rows=1 width=4) (actualtime=1433.469..1433.470 rows=0 loops=1) -> Index Scan using dba_users_pkey on dba_users (cost=0.43..244892.51 rows=600 width=4) (actualtime=1433.468..1433.468 rows=0 loops=1)
          Filter: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
          Rows Removed by Filter: 6000000
  Planning Time: 0.384 ms
  Execution Time: 1433.489 ms
(6 rows)

The problem seems to be that the planner estimates the LIKE qual tomatch 600 rows, but in reality it matches 0.

I think the total_cost calculation method cannot be linear, because thedata distribution is not linear in reality. I try to change it to alogarithmic


Why? I don't see anything wrong with the LIMIT estimation here.

One factor here is that the execution time of the Index Scan + Limitplan depends heavily on whether there are 0 matches, or more. With 0matches, the Index Scans needs to scan the whole table. With 1 rows, itneeds to scan just half of the table on average before finding thematch. With 10 rows, just 10 % of the table, and so forth. So theexecution time is indeed highly non-linear depending on whether thereare any matches or not. Is that what you meant?

It'd be nice to somehow take that non-linearity into account in theestimation. But I don't know how, and I don't think that change in LIMITestimation is the right way.

The reality may be more complicated, I mean there is no better way forthe community leaders to solve this problem, because the community willnever say no to the hint, and there are so many such problems

It's a difficult topic because "hints" mean different things todifferent people. Meanwhile, we have added things like "ALTER FUNCTION... ROWS <rows>" and support functions which can do more accurateestimation of functions. That's a kind of a hint. If someone comes upwith more ways to help the planner with selectivity estimation, it'sworth considering.

Or maybe you could improve the selectivity estimator of the LIKEoperator to be more accurate to begin with.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: adjust_limit_rows_costs algorithm

Reply via email to