Hello all, Like the title says, using "gin_fuzzy_search_limit" degrades speed when it has a relatively low setting.
What do I mean by "relatively low"? An example would be when a table with a GIN index has many millions of rows and a particular keyword search has 1,000,000 possible results because the keyword is very common (or it's just that the table is so supremely large that even a somewhat common keyword appears enough to return one million results). However, you only want to return around 100 random results from that one million, so you set gin_fuzzy_search_limit to 100. That limit is relatively low when you look at the ratio of the limit value to the possible results: 100 / 1,000,000 = 0.0001. You'll find the query is very slow for such a low ratio. It isn't so slow if gin_fuzzy_search_limit is 100 but the keyword search has only a total of 10,000 possible results (resulting in a higher ratio of 0.1). This would explain why in the documentation it is said that "From experience, values in the thousands (e.g., 5000 — 20000) work well". It's not so common to have queries that return large enough result sets such that gin_fuzzy_search_limit values between 5,000 and 20,000 would result in low ratios and so result in the performance issue I've observed (these gin_fuzzy_search_limit values have relatively high ratios between 0.005 and 0.02 if you have 1,000,000 results for a keyword search). However, if you desire a lower gin_fuzzy_search_limit such as 100, while also having a relatively larger table, you'll find this slowness issue. I discussed this issue more and the reason for it in my original bug report: https://www.postgresql.org/message-id/16220-1a0a4f0cb67ca...@postgresql.org Attached is SQL to test and observe this issue and also attached is a patch I want to eventually submit to a commitfest. Best regards, Adé
gin_fuzzy_search_limit_test.sql
Description: Binary data
ginget.patch
Description: Binary data