On Thu, Nov 20, 2014 at 1:30 PM, Peter Geoghegan <p...@heroku.com> wrote: > I mean the suggestion of raising the cost threshold more gradually, > not as a step function of the number of characters in the string [1] > where it's either over 6 characters and must pass the 50% test, or > isn't and has no absolute quality test. The exact modification I > described will FWIW remove the "quantity" for "qty" suggestion, as > well as all the similar suggestions that you found objectionable (like > "tit" also offering a suggestion of "quantity"). > > If you look at the regression tests, none of the sensible suggestions > are lost (some would be by an across the board 50% absolute quality > threshold, as I previously pointed out [2]), but all the bad ones are. > I attach failed regression test output showing the difference between > the previous expected values, and actual values with that small > modification - it looks like most or all bad cases are now fixed.
That does seem to give better results, but it still seems awfully complicated. If we just used Levenshtein with all-default cost factors and a distance cap equal to Max(strlen(what_user_typed), strlen(candidate_match), 3), what cases that you think are important would be harmed? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers