Re: Extracting data from Lucene index files

2006-12-25 Thread Venkateshprasanna
Thanks a lot Doron, it worked fine and thanks for your tip as well! Prasanna Using term vectors means passing on the terms too many times - i.e - loop on terms - - loop on docs of a term - - - loop on terms of a doc Would something like this be better: do { System.out.println(tenum.

Re: Extracting data from Lucene index files

2006-12-19 Thread Venkateshprasanna
> Take a look at TermDocs and TermEnum. I need to get the frequency of each word in each of the documents I have indexed. This is what I could do with TermEnums and TermDocs. For each Term from TermEnum, I have instantiated a TermsDoc and for each doc, I am trying to get the frequency of the Ter

Extracting data from Lucene index files

2006-12-13 Thread Venkateshprasanna
I would like to use the data stored in the Lucene indexes, like the words and their frequencies and store them in a database. Can anyone suggest a way of going about it or is it possible at all? TIA Prasanna -- View this message in context: http://www.nabble.com/Extracting-data-from-Lucene-inde

Storing no. of occurances of a token

2006-09-13 Thread Venkateshprasanna
Is it possible for me to store the number of occurances of a token in a particular document or a collection of documents? Regards, Venkateshprasanna -- View this message in context: http://www.nabble.com/Storing-no.-of-occurances-of-a-token-tf2263455.html#a6280422 Sent from the Lucene - Java

Indexing MS Powerpoint files with Lucene

2006-09-06 Thread Venkateshprasanna
Is there any filter available for extracting text from MS Powerpoint files and indexing them? The lucene website suggests the POI project, which, it seems does not support PPT files as of now. Regards, Venkateshprasanna -- View this message in context: http://www.nabble.com/which-way-to-index

Re: Atomic index/search for a phrase

2006-09-06 Thread Venkateshprasanna
Which is more efficient with respect to performance? Indexing a phrase as it is and searcing with the help of a TermQuery OR Storing only single words in index and making use of quoted search phrases? Regards, Venkateshprasanna If you index "A Phrase" as untokenized, you

How does PhraseQuery search for quoted phrases?

2006-09-06 Thread Venkateshprasanna
How does PhraseQuery search for quoted phrases when the index does not store these phrases as it is? Is there any analyzer that indexes the phrases? -- View this message in context: http://www.nabble.com/How-does-PhraseQuery-search-for-quoted-phrases--tf2225757.html#a6167885 Sent from the Luce

Where do I get org.apache.commons.collections package sources?

2006-09-05 Thread Venkateshprasanna
I saw these classes and want to use them for my implementation as well. But I am not getting the source code for the specified package: org.apache.commons.collections Is there any other way of implementing the same? Why only classes from that package has to be used? Regards, Venkateshprasanna

Indexing bigrams and trigrams in Lucene

2006-09-03 Thread Venkateshprasanna
I need to index bigrams and trigrams in a document. Here is an example: Text: This is a text document written by someone. Read this and post your comments words that must be indexed: text document written someone read post your comments text document document written post your your comments text