Indexing and searching txt files
Hi, I am new to Lucene. I have several text files I would like to index and search. How do I do this? Thanks, jnance -- View this message in context: http://www.nabble.com/Indexing-and-searching-txt-files-tp18031330p18031330.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Indexing and searching txt files
Thanks! Lucene in Action is very helpful. -James -- View this message in context: http://www.nabble.com/Indexing-and-searching-txt-files-tp18031330p18067808.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Searching for instances within a document
Hi, I am indexing lots of text files and need to see how many times a certain word comes up in each text file. Right now I have this constructor for "search": static void search(Searcher searcher, String queryString) throws ParseException, IOException { QueryParser parser = new QueryParser("content", new StandardAnalyzer()); Query query = parser.parse(queryString); Hits hits = searcher.search(query); int hitCount = hits.length(); if (hitCount == 0) { System.out.println("0 documents contain the word \"" + queryString + ".\""); } else { System.out.println(hitCount + " documents contain the word \"" + queryString + ".\""); } } This tells me how many documents contain the word I'm looking for... but how do I get it to tell me how many times the word occurs within that document? Thanks, James -- View this message in context: http://www.nabble.com/Searching-for-instances-within-a-document-tp18362075p18362075.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Searching for instances within a document
Ok, I'll see if I can find anything. Thanks, James -- View this message in context: http://www.nabble.com/Searching-for-instances-within-a-document-tp18362075p18362432.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Searching for instances within a document
Yes, the term frequency vector is exactly what I needed. Thanks! -James Ajay Lakhani wrote: > > Hi James, > > Try this: > > Searcher searcher = new IndexSearcher(dir); > QueryParser parser = new QueryParser("content", new > StandardAnalyzer()); > Query query = parser.parse(queryString); > > HashSet queryTerms = new HashSet(); > query.extractTerms(queryTerms); > > Hits hits = searcher.search(query); > > IndexReader reader = IndexReader.open(dir); > > for (int i =0; i < hits.length() ; i ++){ > Document d = hits.doc(i); > Field fid = d.getField("id"); > Field ftitle = d.getField("title"); > System.out.println("id is " + fid.stringValue()); > System.out.println("title is " + ftitle.stringValue()); > > TermFreqVector tfv = reader.getTermFreqVector(hits.id(i), > "content"); > String[] terms = tfv.getTerms(); > int [] freqs = tfv.getTermFrequencies();//get the frequencies > > // for each term in the query > for (Iterator iter = queryTerms.iterator(); iter.hasNext();) { > Term term = (Term) iter.next(); > > // for each term in the vector > for (int j = 0; j < terms.length; j++) { > if (terms[j].equals(term.text())) { > System.out.println("frequency of term ["+ term.text() +"] is " > + > freqs[j] ); > } > } > } > } > > Let me know if this helps. > Cheers > AJ > > 2008/7/10 Karl Wettin <[EMAIL PROTECTED]>: > >> Maybe you are looking for the document TermFreqVector? >> >> >> karl >> >> 9 jul 2008 kl. 15.49 skrev jnance: >> >> >>> Hi, >>> >>> I am indexing lots of text files and need to see how many times a >>> certain >>> word comes up in each text file. Right now I have this constructor for >>> "search": >>> >>> static void search(Searcher searcher, String queryString) throws >>> ParseException, IOException { >>> QueryParser parser = new QueryParser("content", new >>> StandardAnalyzer()); >>> Query query = parser.parse(queryString); >>> Hits hits = searcher.search(query); >>> >>> int hitCount = hits.length(); >>> if (hitCount == 0) { >>> System.out.println("0 documents contain the word >>> \"" + queryString + >>> ".\""); >>> } >>> else { >>> System.out.println(hitCount + " documents >>> contain >>> the word \"" + >>> queryString + ".\""); >>> } >>> } >>> >>> This tells me how many documents contain the word I'm looking for... but >>> how >>> do I get it to tell me how many times the word occurs within that >>> document? >>> >>> Thanks, >>> >>> James >>> -- >>> View this message in context: >>> http://www.nabble.com/Searching-for-instances-within-a-document-tp18362075p18362075.html >>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>> >>> >>> - >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >>> >> >> - >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> > > -- View this message in context: http://www.nabble.com/Searching-for-instances-within-a-document-tp18362075p18381743.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Searching for instances within a document
The TermFrequencyVector works perfectly for normal query strings. But if I add a wild card (*) onto words to search for different forms of the word I get an ArrayIndexOutOfBoundsException because the index is -1. Why does this happen? And is there anyway to avoid it? Thanks, James jnance wrote: > > Yes, the term frequency vector is exactly what I needed. Thanks! > > -James > > > Ajay Lakhani wrote: >> >> Hi James, >> >> Try this: >> >> Searcher searcher = new IndexSearcher(dir); >> QueryParser parser = new QueryParser("content", new >> StandardAnalyzer()); >> Query query = parser.parse(queryString); >> >> HashSet queryTerms = new HashSet(); >> query.extractTerms(queryTerms); >> >> Hits hits = searcher.search(query); >> >> IndexReader reader = IndexReader.open(dir); >> >> for (int i =0; i < hits.length() ; i ++){ >> Document d = hits.doc(i); >> Field fid = d.getField("id"); >> Field ftitle = d.getField("title"); >> System.out.println("id is " + fid.stringValue()); >> System.out.println("title is " + ftitle.stringValue()); >> >> TermFreqVector tfv = reader.getTermFreqVector(hits.id(i), >> "content"); >> String[] terms = tfv.getTerms(); >> int [] freqs = tfv.getTermFrequencies();//get the frequencies >> >> // for each term in the query >> for (Iterator iter = queryTerms.iterator(); iter.hasNext();) { >> Term term = (Term) iter.next(); >> >> // for each term in the vector >> for (int j = 0; j < terms.length; j++) { >> if (terms[j].equals(term.text())) { >> System.out.println("frequency of term ["+ term.text() +"] is >> " + >> freqs[j] ); >> } >> } >> } >> } >> >> Let me know if this helps. >> Cheers >> AJ >> >> 2008/7/10 Karl Wettin <[EMAIL PROTECTED]>: >> >>> Maybe you are looking for the document TermFreqVector? >>> >>> >>> karl >>> >>> 9 jul 2008 kl. 15.49 skrev jnance: >>> >>> >>>> Hi, >>>> >>>> I am indexing lots of text files and need to see how many times a >>>> certain >>>> word comes up in each text file. Right now I have this constructor for >>>> "search": >>>> >>>> static void search(Searcher searcher, String queryString) throws >>>> ParseException, IOException { >>>> QueryParser parser = new QueryParser("content", new >>>> StandardAnalyzer()); >>>> Query query = parser.parse(queryString); >>>> Hits hits = searcher.search(query); >>>> >>>> int hitCount = hits.length(); >>>> if (hitCount == 0) { >>>> System.out.println("0 documents contain the >>>> word >>>> \"" + queryString + >>>> ".\""); >>>> } >>>> else { >>>> System.out.println(hitCount + " documents >>>> contain >>>> the word \"" + >>>> queryString + ".\""); >>>> } >>>> } >>>> >>>> This tells me how many documents contain the word I'm looking for... >>>> but >>>> how >>>> do I get it to tell me how many times the word occurs within that >>>> document? >>>> >>>> Thanks, >>>> >>>> James >>>> -- >>>> View this message in context: >>>> http://www.nabble.com/Searching-for-instances-within-a-document-tp18362075p18362075.html >>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>>> >>>> >>>> - >>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>>> For additional commands, e-mail: [EMAIL PROTECTED] >>>> >>>> >>> >>> - >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >>> >> >> > > -- View this message in context: http://www.nabble.com/Searching-for-instances-within-a-document-tp18362075p18403878.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]