Hello all,

Like the title says, using "gin_fuzzy_search_limit" degrades speed when it
has a relatively low setting.

What do I mean by "relatively low"? An example would be when a table with a
GIN index has many millions of rows and a particular keyword search has
1,000,000 possible results because the keyword is very common (or it's just
that the table is so supremely large that even a somewhat common keyword
appears enough to return one million results). However, you only want to
return around 100 random results from that one million, so you set
gin_fuzzy_search_limit to 100. That limit is relatively low when you look
at the ratio of the limit value to the possible results: 100 / 1,000,000  =
0.0001. You'll find the query is very slow for such a low ratio. It isn't
so slow if gin_fuzzy_search_limit is 100 but the keyword search has only a
total of 10,000 possible results (resulting in a higher ratio of 0.1).

This would explain why in the documentation it is said that "From
experience, values in the thousands (e.g., 5000 — 20000) work well". It's
not so common to have queries that return large enough result sets such
that gin_fuzzy_search_limit values between 5,000 and 20,000 would result in
low ratios and so result in the performance issue I've observed (these
gin_fuzzy_search_limit values have relatively high ratios between 0.005 and
0.02 if you have 1,000,000 results for a keyword search). However, if you
desire a lower gin_fuzzy_search_limit such as 100, while also having a
relatively larger table, you'll find this slowness issue.

I discussed this issue more and the reason for it in my original bug
report:
https://www.postgresql.org/message-id/16220-1a0a4f0cb67ca...@postgresql.org

Attached is SQL to test and observe this issue and also attached is a patch
I want to eventually submit to a commitfest.

Best regards,
Adé

Attachment: gin_fuzzy_search_limit_test.sql
Description: Binary data

Attachment: ginget.patch
Description: Binary data

Reply via email to