Well that's the issue.. I get back this result.. but i don't get other simpler..
with query "string~ matching~" i get: Found 14 hits. 1. 1string 2matching 2. str_ing ma_tching 3. strang mutching 4. str4ing match2ing 5. string matching 6. string_m atching 7. string matching another token 8. strrring maatchinng 9. string matching123 10. string123 matching 11. strasding matc4hing ano23ther tok3en 12. str4ing maaatching_another 2t oken 13. strfffing_ m atcbbhing 14. string123 matching123 2013/2/9 Jack Krupansky <j...@basetechnology.com> > You probably are not getting this document returned: > > > list.add("strfffing_ m atcbbhing"); > > because... both terms have an edit distance greater than two. > > All the other documents have one or the other or both terms with an > editing distance of 2 or less. > > Your query is essentially: Match a document if EITHER term matches. So, if > NEITHER matches (within an editing distance of 2), the document is not a > match. > > -- Jack Krupansky > > -----Original Message----- From: Pierre Antoine DuBoDeNa > Sent: Saturday, February 09, 2013 12:52 PM > To: java-user@lucene.apache.org > Subject: Re: fuzzy queries > > > with query like string~ matching~ (without specifying threshold) i get 14 > results back.. > > Can it be problem with the analyzers? > > Here is the code: > > private File indexDir = new File("/a-directory-here"); > > private StandardAnalyzer analyzer = new StandardAnalyzer(Version.** > LUCENE_35); > > private IndexWriterConfig config = new IndexWriterConfig(Version.** > LUCENE_35, > analyzer); > > public static void main(String[] args) throws Exception { > > IndexProfiles Indexer = new IndexProfiles(); > > IndexWriter w = Indexer.CreateIndex(); > > ArrayList<String> list = new ArrayList<String>(); > > list.add("string matching"); > > list.add("string123 matching"); > > list.add("string matching123"); > > list.add("string123 matching123"); > > list.add("str4ing match2ing"); > > list.add("1string 2matching"); > > list.add("str_ing ma_tching"); > > list.add("string_matching"); > > list.add("strang mutching"); > > list.add("strrring maatchinng"); > > list.add("strfffing_ m atcbbhing"); > > list.add("str2ing__mat3ching")**; > > list.add("string_m atching"); > > list.add("string matching another token"); > > list.add("strasding matc4hing ano23ther tok3en"); > > list.add("str4ing maaatching_another 2t oken"); > > > for (String companyname:list) > > { > > Indexer.addSingleField(w, companyname); > > } > > > int numDocs = w.numDocs(); > > System.out.println("# of Docs in Index: " + numDocs); > > w.close(); > > > DoIndexQuery("string~ matching~"); > > } > > public static void DoIndexQuery(String query) throws IOException, > ParseException { > > IndexProfiles Indexer = new IndexProfiles(); > > IndexReader reader = Indexer.LoadIndex(); > > > Indexer.SearchIndex(reader, query, 50); > > > > reader.close(); > > } > > > public IndexWriter CreateIndex() throws IOException { > > > > Directory index = FSDirectory.open(indexDir); > > IndexWriter w = new IndexWriter(index, config); > > return w; > > > > } > > > public HashMap SearchIndex(IndexReader w, String query, int topk) > throwsIOException, ParseException { > > > > > Query q = new QueryParser(Version.LUCENE_35, "Name", analyzer > ).parse(query); > > > > IndexSearcher searcher = new IndexSearcher(w); > > TopScoreDocCollector collector = TopScoreDocCollector.create(**topk, > true); > > searcher.search(q, collector); > > ScoreDoc[] hits = collector.topDocs().scoreDocs; > > > > System.out.println("Found " + hits.length + " hits."); > > HashMap map = new HashMap(); > > for(int i=0;i<hits.length;++i) { > > int docId = hits[i].doc; > > Document d = searcher.doc(docId); > > map.put(docId, d.get("Name")); > > System.out.println((i + 1) + ". " + d.get("Name")); > > } > > > searcher.close(); > > return map; > > > > } > > public void addSingleField(IndexWriter w, String str) throws IOException { > > > Document doc = new Document(); > > doc.add(new Field("Name", str, Field.Store.YES, Field.Index.ANALYZED)); > > w.addDocument(doc); > > } > > > > > > 2013/2/9 Michael McCandless <luc...@mikemccandless.com> > > Can you reduce your test case to indexing one document/field and >> running a single FuzzyQuery (you seem to be running two at once, >> OR'ing the results)? >> >> And show the complete standalone source code (eg what is topk?) so we >> can see how you are indexing / building the Query / searching. >> >> The default minSim is 0.5. >> >> Note that 0.01 is not useful in practice: it (should) match nearly all >> terms. But I agree it's odd one term is not matching. >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> On Sat, Feb 9, 2013 at 5:20 AM, Pierre Antoine DuBoDeNa >> <pad...@gmail.com> wrote: >> >> >> >> Hello, >> >> >> >> I use lucene 3.6 and i try to use fuzzy queries so that I can match >> >> much >> >> more results. >> >> >> >> I am adding for example these strings: >> >> >> >> list.add("string matching"); >> >> >> >> list.add("string123 matching"); >> >> >> >> list.add("string matching123"); >> >> >> >> list.add("string123 matching123"); >> >> >> >> list.add("str4ing match2ing"); >> >> >> >> list.add("1string 2matching"); >> >> >> >> list.add("str_ing ma_tching"); >> >> >> >> list.add("string_matching"); >> >> >> >> list.add("strang mutching"); >> >> >> >> list.add("strrring maatchinng"); >> >> >> >> list.add("strfffing_ m atcbbhing"); >> >> >> >> list.add("str2ing__mat3ching")**; >> >> >> >> list.add("string_m atching"); >> >> >> >> list.add("string matching another token"); >> >> >> >> list.add("strasding matc4hing ano23ther tok3en"); >> >> >> >> list.add("str4ing maaatching_another 2t oken"); >> >> >> >> >> >> >> >> then i do a query: >> >> >> >> >> >> "string~0.01 matching~0.01" >> >> >> >> >> >> and I get back these results: >> >> >> >> >> >> Found 15 hits. >> >> >> >> 1. 1string 2matching >> >> >> >> 2. str_ing ma_tching >> >> >> >> 3. string_m atching >> >> >> >> 4. strang mutching >> >> >> >> 5. str4ing match2ing >> >> >> >> 6. strrring maatchinng >> >> >> >> 7. string matching >> >> >> >> 8. strasding matc4hing ano23ther tok3en >> >> >> >> 9. string matching another token >> >> >> >> 10. string matching123 >> >> >> >> 11. string123 matching >> >> >> >> 12. strfffing_ m atcbbhing >> >> >> >> 13. string123 matching123 >> >> >> >> 14. str4ing maaatching_another 2t oken >> >> >> >> 15. string_matching >> >> >> >> So only 1 result is missing (with threshold 0.01).. str2ing__mat3ching >> any >> >> idea why? how can i extend the query to catch this one as well? >> >> >> >> Also what's the default threshold for the ~ operator? Without >> >> specifying >> >> threshold I get 14 results string_matching and str2ing__mat3ching >> missing >> >> this time. >> >> >> >> Here is the code for the queries >> >> >> >> >> >> Query q = new QueryParser(Version.LUCENE_35, "Name", analyzer >> >> ).parse(query); >> >> >> >> >> >> >> >> IndexSearcher searcher = new IndexSearcher(w); >> >> >> >> TopScoreDocCollector collector = TopScoreDocCollector.create(**topk, >> true); >> >> >> >> searcher.search(q, collector); >> >> >> >> ScoreDoc[] hits = collector.topDocs().scoreDocs; >> >> >> >> >> >> Thanks for the help. >> >> >> >> >> >> ------------------------------**------------------------------**--------- >> To unsubscribe, e-mail: >> java-user-unsubscribe@lucene.**apache.org<java-user-unsubscr...@lucene.apache.org> >> For additional commands, e-mail: >> java-user-help@lucene.apache.**org<java-user-h...@lucene.apache.org> >> >> >> > > ------------------------------**------------------------------**--------- > To unsubscribe, e-mail: > java-user-unsubscribe@lucene.**apache.org<java-user-unsubscr...@lucene.apache.org> > For additional commands, e-mail: > java-user-help@lucene.apache.**org<java-user-h...@lucene.apache.org> > >