We started to implement a named entity recognition on the base of AnalyzingSuggester, which offers the great support for Synonyms, Stopwords, etc. For this, we slightly modified AnalyzingSuggester.lookup() to only return the exactFirst hits (considering the exactFirst code block only, skipping the 'sameSurfaceForm' check and break, to get the synonym hits too).
This works pretty good, and our next step would be to bring in some fuzzyness against spelling mistakes. For this, the idea was to do exactly the same, but with FuzzySuggester instead. Now we have the problem that 'EXCACT_FIRST' in FuzzySuggester not only relies on sharing the same prefix - also different/misspelled terms inside the edit distance are considered as 'not exact', which means we get the same results as with AnalyzingSuggester. query: "screen" misspelled query: "screan" dictionary: "screen", "screensaver" AnalyzingSuggester hits: screen, screensaver AnalyzingSuggester hits on misspelled query: <empty> AnalyzingSuggester EXACT_FIRST hits: screen AnalyzingSuggester EXACT_FIRST hits on misspelled query: <empty> FuzzySuggester hits: screen, screensaver FuzzySuggester hits on misspelled query: screen, screensaver FuzzySuggester EXACT_FIRST hits: screen FuzzySuggester EXACT_FIRST hits on misspelled query: <empty> => TARGET: screen Is there a possibility to distinguish? I see that the 'exact' criteria relies on an FST aspect 'END_BYTE arc leaving'. Maybe these can be set differently when building the Levenshtein automata? I have no clue. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org