Quoting Martin Rode <[EMAIL PROTECTED]>: > Hi everybody, > > Has anyone tried to code a solution like Google's "Did you mean?" in > Lucene? > > I would be very happy to hear your ideas, approaches, suggestions.
I know that what Google does is look at consecutive queries by the same user that are similar. If the two queries are very similar, with only one or two characters changed, there is a very high probability that one of the query is a correct spelling while the other is a "common" misspelling. Its easy to figure which is the correct spelling by looking up the words in a dictionary. All they have to do now is add the mispelt store the correct and mispelt word pair in a mapping table and reference that table for every query. Of course, this only works because Google's huge query volume ensures that they can get sufficient quantities of such query pairs. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]