Hi all,
I tried to figure out how the fuzzy search implementation in Apache Lucene works and I'm kinda stuck here. --> Version : Apache Lucene 3.0.1 (JAVA) [What I want / need] I'm looking for a way to combine a prefix-, fuzzy- and wildcard query. Q: Is it possible to have a query like "user_input~0.5*" ? [JavaDoc for org.apache.lucene.search.FuzzyQuery c-tor] @param minimumSimilarity: a value between 0 and 1 to set the required similarity between the query term and the matching terms. For example, for a minimumSimilarity of 0.5 a term of the same length as the query term is considered similar to the query term if the edit distance between both terms is less than length(term)*0.5 Q: Mh... what if the query term differs in it's length to the term in my document? [Test case] I have written a small test program (JUnit test case) to explain my problem / confusion in detail: --> http://eugeneciurana.com/pastebin/pastebin.php?show=42619 [Examples] Search term --> Subset of expected result Cinamo~0.5 --> Cinema, Cinnamon [works] Strawbarr~0.8 --> Strawberry [doesn't work] --> As far as I understand, the "Edit distance" (aka "Levinshtein distance") between "Strawbarr" and "Strawberry" is 2 (one replacement and one insertion to transform "Strawbarr" into "Strawberry") The query "Strawbarr~0.8" in my opinion (and from what I read from the JavaDocs) should work just fine, because len(Strawbarr)*0.8 == 9*0.8 == 7.2 ... 7.2 >= 2 ... still -- it doesn't work. Is that, because the length of the search term and the word in my document differ? I already searched the wiki, the mailing list archive and had a look in all the "obvious" places but had no luck so far. If I am missing something obvious here I would be glad to receive some pointers into the right direction. <-- Regards -benjamin- -- Benjamin Jung <bpj...@terreon.de> Terreon, http://terreon.de/ Tel.: +49 (0)69 / 8484 65 37 Fax: +49 (0)6054 / 909 788 2 Mobil +49 (0)1577 / 159 788 3 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org