On 24/12/2024 08:20, wenhui qiu wrote:
sysbench=# explain analyze  select userid from dba_users where  username like '%aaaaaaaaaaaa%' order by userid limit 1;
                                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..408.59 rows=1 width=4) (actual time=1433.469..1433.470 rows=0 loops=1)    ->  Index Scan using dba_users_pkey on dba_users  (cost=0.43..244892.51 rows=600 width=4) (actual time=1433.468..1433.468 rows=0 loops=1)
          Filter: ((username)::text ~~ '%aaaaaaaaaaaa%'::text)
          Rows Removed by Filter: 6000000
  Planning Time: 0.384 ms
  Execution Time: 1433.489 ms
(6 rows)

The problem seems to be that the planner estimates the LIKE qual to match 600 rows, but in reality it matches 0.

I think the total_cost calculation method cannot be linear, because the data distribution is not linear in reality. I try to change it to a logarithmic

Why? I don't see anything wrong with the LIMIT estimation here.

One factor here is that the execution time of the Index Scan + Limit plan depends heavily on whether there are 0 matches, or more. With 0 matches, the Index Scans needs to scan the whole table. With 1 rows, it needs to scan just half of the table on average before finding the match. With 10 rows, just 10 % of the table, and so forth. So the execution time is indeed highly non-linear depending on whether there are any matches or not. Is that what you meant?

It'd be nice to somehow take that non-linearity into account in the estimation. But I don't know how, and I don't think that change in LIMIT estimation is the right way.

The reality may be more complicated, I mean there is no better way for the community leaders to solve this problem, because the community will never say no to the hint, and there are so many such problems

It's a difficult topic because "hints" mean different things to different people. Meanwhile, we have added things like "ALTER FUNCTION ... ROWS <rows>" and support functions which can do more accurate estimation of functions. That's a kind of a hint. If someone comes up with more ways to help the planner with selectivity estimation, it's worth considering.

Or maybe you could improve the selectivity estimator of the LIKE operator to be more accurate to begin with.

--
Heikki Linnakangas
Neon (https://neon.tech)



Reply via email to