Indexing and searching txt files

2008-06-20 Thread jnance

Hi,

I am new to Lucene. I have several text files I would like to index and
search. How do I do this?

Thanks,

jnance
-- 
View this message in context: 
http://www.nabble.com/Indexing-and-searching-txt-files-tp18031330p18031330.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Indexing and searching txt files

2008-06-23 Thread jnance

Thanks! Lucene in Action is very helpful.

-James
-- 
View this message in context: 
http://www.nabble.com/Indexing-and-searching-txt-files-tp18031330p18067808.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Searching for instances within a document

2008-07-09 Thread jnance

Hi,

I am indexing lots of text files and need to see how many times a certain
word comes up in each text file. Right now I have this constructor for
"search":

 static void search(Searcher searcher, String queryString) throws
ParseException, IOException {
 QueryParser parser = new QueryParser("content", new 
StandardAnalyzer());
 Query query = parser.parse(queryString);
 Hits hits = searcher.search(query);
 
 int hitCount = hits.length();
 if (hitCount == 0) {
 System.out.println("0 documents contain the word \"" + 
queryString +
".\""); 
 }
 else {
 System.out.println(hitCount + " documents contain the 
word \"" +
queryString + ".\"");
 }
 }

This tells me how many documents contain the word I'm looking for... but how
do I get it to tell me how many times the word occurs within that document?

Thanks,

James
-- 
View this message in context: 
http://www.nabble.com/Searching-for-instances-within-a-document-tp18362075p18362075.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Searching for instances within a document

2008-07-09 Thread jnance

Ok, I'll see if I can find anything.

Thanks,

James

-- 
View this message in context: 
http://www.nabble.com/Searching-for-instances-within-a-document-tp18362075p18362432.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Searching for instances within a document

2008-07-10 Thread jnance

Yes, the term frequency vector is exactly what I needed. Thanks!

-James


Ajay Lakhani wrote:
> 
> Hi James,
> 
> Try this:
> 
> Searcher searcher = new IndexSearcher(dir);
> QueryParser parser = new QueryParser("content", new
> StandardAnalyzer());
> Query query = parser.parse(queryString);
> 
> HashSet queryTerms = new HashSet();
> query.extractTerms(queryTerms);
> 
> Hits hits = searcher.search(query);
> 
> IndexReader reader = IndexReader.open(dir);
> 
> for (int i =0; i < hits.length() ; i ++){
>   Document d = hits.doc(i);
>   Field fid = d.getField("id");
>   Field ftitle = d.getField("title");
>   System.out.println("id is " + fid.stringValue());
>   System.out.println("title is " + ftitle.stringValue());
> 
>   TermFreqVector tfv = reader.getTermFreqVector(hits.id(i),
> "content");
>   String[] terms = tfv.getTerms();
>   int [] freqs = tfv.getTermFrequencies();//get the frequencies
> 
>   // for each term in the query
>   for (Iterator iter = queryTerms.iterator(); iter.hasNext();) {
> Term term = (Term) iter.next();
> 
> // for each term in the vector
> for (int j = 0; j < terms.length; j++) {
>   if (terms[j].equals(term.text())) {
> System.out.println("frequency of term ["+ term.text() +"] is "
> +
> freqs[j] );
>   }
> }
>   }
> }
> 
> Let me know if this helps.
> Cheers
> AJ
> 
> 2008/7/10 Karl Wettin <[EMAIL PROTECTED]>:
> 
>> Maybe you are looking for the document TermFreqVector?
>>
>>
>>   karl
>>
>> 9 jul 2008 kl. 15.49 skrev jnance:
>>
>>
>>> Hi,
>>>
>>> I am indexing lots of text files and need to see how many times a
>>> certain
>>> word comes up in each text file. Right now I have this constructor for
>>> "search":
>>>
>>> static void search(Searcher searcher, String queryString) throws
>>> ParseException, IOException {
>>> QueryParser parser = new QueryParser("content", new
>>> StandardAnalyzer());
>>> Query query = parser.parse(queryString);
>>> Hits hits = searcher.search(query);
>>>
>>> int hitCount = hits.length();
>>> if (hitCount == 0) {
>>> System.out.println("0 documents contain the word
>>> \"" + queryString +
>>> ".\"");
>>> }
>>> else {
>>> System.out.println(hitCount + " documents
>>> contain
>>> the word \"" +
>>> queryString + ".\"");
>>> }
>>> }
>>>
>>> This tells me how many documents contain the word I'm looking for... but
>>> how
>>> do I get it to tell me how many times the word occurs within that
>>> document?
>>>
>>> Thanks,
>>>
>>> James
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Searching-for-instances-within-a-document-tp18362075p18362075.html
>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>>
>>
>> -
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Searching-for-instances-within-a-document-tp18362075p18381743.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Searching for instances within a document

2008-07-11 Thread jnance

The TermFrequencyVector works perfectly for normal query strings. But if I
add a wild card (*) onto words to search for different forms of the word I
get an ArrayIndexOutOfBoundsException because the index is -1. Why does this
happen? And is there anyway to avoid it?

Thanks,

James



jnance wrote:
> 
> Yes, the term frequency vector is exactly what I needed. Thanks!
> 
> -James
> 
> 
> Ajay Lakhani wrote:
>> 
>> Hi James,
>> 
>> Try this:
>> 
>> Searcher searcher = new IndexSearcher(dir);
>> QueryParser parser = new QueryParser("content", new
>> StandardAnalyzer());
>> Query query = parser.parse(queryString);
>> 
>> HashSet queryTerms = new HashSet();
>> query.extractTerms(queryTerms);
>> 
>> Hits hits = searcher.search(query);
>> 
>> IndexReader reader = IndexReader.open(dir);
>> 
>> for (int i =0; i < hits.length() ; i ++){
>>   Document d = hits.doc(i);
>>   Field fid = d.getField("id");
>>   Field ftitle = d.getField("title");
>>   System.out.println("id is " + fid.stringValue());
>>   System.out.println("title is " + ftitle.stringValue());
>> 
>>   TermFreqVector tfv = reader.getTermFreqVector(hits.id(i),
>> "content");
>>   String[] terms = tfv.getTerms();
>>   int [] freqs = tfv.getTermFrequencies();//get the frequencies
>> 
>>   // for each term in the query
>>   for (Iterator iter = queryTerms.iterator(); iter.hasNext();) {
>> Term term = (Term) iter.next();
>> 
>> // for each term in the vector
>> for (int j = 0; j < terms.length; j++) {
>>   if (terms[j].equals(term.text())) {
>> System.out.println("frequency of term ["+ term.text() +"] is
>> " +
>> freqs[j] );
>>   }
>> }
>>   }
>> }
>> 
>> Let me know if this helps.
>> Cheers
>> AJ
>> 
>> 2008/7/10 Karl Wettin <[EMAIL PROTECTED]>:
>> 
>>> Maybe you are looking for the document TermFreqVector?
>>>
>>>
>>>   karl
>>>
>>> 9 jul 2008 kl. 15.49 skrev jnance:
>>>
>>>
>>>> Hi,
>>>>
>>>> I am indexing lots of text files and need to see how many times a
>>>> certain
>>>> word comes up in each text file. Right now I have this constructor for
>>>> "search":
>>>>
>>>> static void search(Searcher searcher, String queryString) throws
>>>> ParseException, IOException {
>>>> QueryParser parser = new QueryParser("content", new
>>>> StandardAnalyzer());
>>>> Query query = parser.parse(queryString);
>>>> Hits hits = searcher.search(query);
>>>>
>>>> int hitCount = hits.length();
>>>> if (hitCount == 0) {
>>>> System.out.println("0 documents contain the
>>>> word
>>>> \"" + queryString +
>>>> ".\"");
>>>> }
>>>> else {
>>>> System.out.println(hitCount + " documents
>>>> contain
>>>> the word \"" +
>>>> queryString + ".\"");
>>>> }
>>>> }
>>>>
>>>> This tells me how many documents contain the word I'm looking for...
>>>> but
>>>> how
>>>> do I get it to tell me how many times the word occurs within that
>>>> document?
>>>>
>>>> Thanks,
>>>>
>>>> James
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/Searching-for-instances-within-a-document-tp18362075p18362075.html
>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>>
>>>>
>>>> -
>>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>>
>>>>
>>>
>>> -
>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>>
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Searching-for-instances-within-a-document-tp18362075p18403878.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]