anyone with an idea what's happening? I've tried 4-5 different queries.. many thresholds.. but can't get all results back..
2013/2/9 Pierre Antoine DuBoDeNa <pad...@gmail.com> > Well that's the issue.. I get back this result.. but i don't get other > simpler.. > > with query "string~ matching~" i get: > > Found 14 hits. > > 1. 1string 2matching > > 2. str_ing ma_tching > > 3. strang mutching > > 4. str4ing match2ing > > 5. string matching > > 6. string_m atching > > 7. string matching another token > > 8. strrring maatchinng > > 9. string matching123 > > 10. string123 matching > > 11. strasding matc4hing ano23ther tok3en > > 12. str4ing maaatching_another 2t oken > > 13. strfffing_ m atcbbhing > > 14. string123 matching123 > > 2013/2/9 Jack Krupansky <j...@basetechnology.com> > >> You probably are not getting this document returned: >> >> >> list.add("strfffing_ m atcbbhing"); >> >> because... both terms have an edit distance greater than two. >> >> All the other documents have one or the other or both terms with an >> editing distance of 2 or less. >> >> Your query is essentially: Match a document if EITHER term matches. So, >> if NEITHER matches (within an editing distance of 2), the document is not a >> match. >> >> -- Jack Krupansky >> >> -----Original Message----- From: Pierre Antoine DuBoDeNa >> Sent: Saturday, February 09, 2013 12:52 PM >> To: java-user@lucene.apache.org >> Subject: Re: fuzzy queries >> >> >> with query like string~ matching~ (without specifying threshold) i get 14 >> results back.. >> >> Can it be problem with the analyzers? >> >> Here is the code: >> >> private File indexDir = new File("/a-directory-here"); >> >> private StandardAnalyzer analyzer = new StandardAnalyzer(Version.** >> LUCENE_35); >> >> private IndexWriterConfig config = new IndexWriterConfig(Version.** >> LUCENE_35, >> analyzer); >> >> public static void main(String[] args) throws Exception { >> >> IndexProfiles Indexer = new IndexProfiles(); >> >> IndexWriter w = Indexer.CreateIndex(); >> >> ArrayList<String> list = new ArrayList<String>(); >> >> list.add("string matching"); >> >> list.add("string123 matching"); >> >> list.add("string matching123"); >> >> list.add("string123 matching123"); >> >> list.add("str4ing match2ing"); >> >> list.add("1string 2matching"); >> >> list.add("str_ing ma_tching"); >> >> list.add("string_matching"); >> >> list.add("strang mutching"); >> >> list.add("strrring maatchinng"); >> >> list.add("strfffing_ m atcbbhing"); >> >> list.add("str2ing__mat3ching")**; >> >> list.add("string_m atching"); >> >> list.add("string matching another token"); >> >> list.add("strasding matc4hing ano23ther tok3en"); >> >> list.add("str4ing maaatching_another 2t oken"); >> >> >> for (String companyname:list) >> >> { >> >> Indexer.addSingleField(w, companyname); >> >> } >> >> >> int numDocs = w.numDocs(); >> >> System.out.println("# of Docs in Index: " + numDocs); >> >> w.close(); >> >> >> DoIndexQuery("string~ matching~"); >> >> } >> >> public static void DoIndexQuery(String query) throws IOException, >> ParseException { >> >> IndexProfiles Indexer = new IndexProfiles(); >> >> IndexReader reader = Indexer.LoadIndex(); >> >> >> Indexer.SearchIndex(reader, query, 50); >> >> >> >> reader.close(); >> >> } >> >> >> public IndexWriter CreateIndex() throws IOException { >> >> >> >> Directory index = FSDirectory.open(indexDir); >> >> IndexWriter w = new IndexWriter(index, config); >> >> return w; >> >> >> >> } >> >> >> public HashMap SearchIndex(IndexReader w, String query, int topk) >> throwsIOException, ParseException { >> >> >> >> >> Query q = new QueryParser(Version.LUCENE_35, "Name", analyzer >> ).parse(query); >> >> >> >> IndexSearcher searcher = new IndexSearcher(w); >> >> TopScoreDocCollector collector = TopScoreDocCollector.create(**topk, >> true); >> >> searcher.search(q, collector); >> >> ScoreDoc[] hits = collector.topDocs().scoreDocs; >> >> >> >> System.out.println("Found " + hits.length + " hits."); >> >> HashMap map = new HashMap(); >> >> for(int i=0;i<hits.length;++i) { >> >> int docId = hits[i].doc; >> >> Document d = searcher.doc(docId); >> >> map.put(docId, d.get("Name")); >> >> System.out.println((i + 1) + ". " + d.get("Name")); >> >> } >> >> >> searcher.close(); >> >> return map; >> >> >> >> } >> >> public void addSingleField(IndexWriter w, String str) throws IOException { >> >> >> Document doc = new Document(); >> >> doc.add(new Field("Name", str, Field.Store.YES, Field.Index.ANALYZED)); >> >> w.addDocument(doc); >> >> } >> >> >> >> >> >> 2013/2/9 Michael McCandless <luc...@mikemccandless.com> >> >> Can you reduce your test case to indexing one document/field and >>> running a single FuzzyQuery (you seem to be running two at once, >>> OR'ing the results)? >>> >>> And show the complete standalone source code (eg what is topk?) so we >>> can see how you are indexing / building the Query / searching. >>> >>> The default minSim is 0.5. >>> >>> Note that 0.01 is not useful in practice: it (should) match nearly all >>> terms. But I agree it's odd one term is not matching. >>> >>> Mike McCandless >>> >>> http://blog.mikemccandless.com >>> >>> On Sat, Feb 9, 2013 at 5:20 AM, Pierre Antoine DuBoDeNa >>> <pad...@gmail.com> wrote: >>> >> >>> >> Hello, >>> >> >>> >> I use lucene 3.6 and i try to use fuzzy queries so that I can match >>> >> much >>> >> more results. >>> >> >>> >> I am adding for example these strings: >>> >> >>> >> list.add("string matching"); >>> >> >>> >> list.add("string123 matching"); >>> >> >>> >> list.add("string matching123"); >>> >> >>> >> list.add("string123 matching123"); >>> >> >>> >> list.add("str4ing match2ing"); >>> >> >>> >> list.add("1string 2matching"); >>> >> >>> >> list.add("str_ing ma_tching"); >>> >> >>> >> list.add("string_matching"); >>> >> >>> >> list.add("strang mutching"); >>> >> >>> >> list.add("strrring maatchinng"); >>> >> >>> >> list.add("strfffing_ m atcbbhing"); >>> >> >>> >> list.add("str2ing__mat3ching")**; >>> >> >>> >> list.add("string_m atching"); >>> >> >>> >> list.add("string matching another token"); >>> >> >>> >> list.add("strasding matc4hing ano23ther tok3en"); >>> >> >>> >> list.add("str4ing maaatching_another 2t oken"); >>> >> >>> >> >>> >> >>> >> then i do a query: >>> >> >>> >> >>> >> "string~0.01 matching~0.01" >>> >> >>> >> >>> >> and I get back these results: >>> >> >>> >> >>> >> Found 15 hits. >>> >> >>> >> 1. 1string 2matching >>> >> >>> >> 2. str_ing ma_tching >>> >> >>> >> 3. string_m atching >>> >> >>> >> 4. strang mutching >>> >> >>> >> 5. str4ing match2ing >>> >> >>> >> 6. strrring maatchinng >>> >> >>> >> 7. string matching >>> >> >>> >> 8. strasding matc4hing ano23ther tok3en >>> >> >>> >> 9. string matching another token >>> >> >>> >> 10. string matching123 >>> >> >>> >> 11. string123 matching >>> >> >>> >> 12. strfffing_ m atcbbhing >>> >> >>> >> 13. string123 matching123 >>> >> >>> >> 14. str4ing maaatching_another 2t oken >>> >> >>> >> 15. string_matching >>> >> >>> >> So only 1 result is missing (with threshold 0.01).. str2ing__mat3ching >>> any >>> >> idea why? how can i extend the query to catch this one as well? >>> >> >>> >> Also what's the default threshold for the ~ operator? Without >> >>> specifying >>> >> threshold I get 14 results string_matching and str2ing__mat3ching >>> missing >>> >> this time. >>> >> >>> >> Here is the code for the queries >>> >> >>> >> >>> >> Query q = new QueryParser(Version.LUCENE_35, "Name", analyzer >>> >> ).parse(query); >>> >> >>> >> >>> >> >>> >> IndexSearcher searcher = new IndexSearcher(w); >>> >> >>> >> TopScoreDocCollector collector = TopScoreDocCollector.create(**topk, >>> true); >>> >> >>> >> searcher.search(q, collector); >>> >> >>> >> ScoreDoc[] hits = collector.topDocs().scoreDocs; >>> >> >>> >> >>> >> Thanks for the help. >>> >> >>> >> >>> >>> ------------------------------**------------------------------** >>> --------- >>> To unsubscribe, e-mail: >>> java-user-unsubscribe@lucene.**apache.org<java-user-unsubscr...@lucene.apache.org> >>> For additional commands, e-mail: >>> java-user-help@lucene.apache.**org<java-user-h...@lucene.apache.org> >>> >>> >>> >> >> ------------------------------**------------------------------**--------- >> To unsubscribe, e-mail: >> java-user-unsubscribe@lucene.**apache.org<java-user-unsubscr...@lucene.apache.org> >> For additional commands, e-mail: >> java-user-help@lucene.apache.**org<java-user-h...@lucene.apache.org> >> >> >