Thanks a lot Doron, it worked fine and thanks for your tip as well!
Prasanna
Using term vectors means passing on the terms too many times - i.e
- loop on terms
- - loop on docs of a term
- - - loop on terms of a doc
Would something like this be better:
do {
System.out.println(tenum.
> Take a look at TermDocs and TermEnum.
I need to get the frequency of each word in each of the documents I have
indexed.
This is what I could do with TermEnums and TermDocs. For each Term from
TermEnum, I have instantiated a TermsDoc and for each doc, I am trying to
get the frequency of the Ter
I would like to use the data stored in the Lucene indexes, like the words and
their frequencies and store them in a database. Can anyone suggest a way of
going about it or is it possible at all?
TIA
Prasanna
--
View this message in context:
http://www.nabble.com/Extracting-data-from-Lucene-inde
Is it possible for me to store the number of occurances of a token in a
particular document or a collection of documents?
Regards,
Venkateshprasanna
--
View this message in context:
http://www.nabble.com/Storing-no.-of-occurances-of-a-token-tf2263455.html#a6280422
Sent from the Lucene - Java
Is there any filter available for extracting text from MS Powerpoint files
and indexing them?
The lucene website suggests the POI project, which, it seems does not
support PPT files as of now.
Regards,
Venkateshprasanna
--
View this message in context:
http://www.nabble.com/which-way-to-index
Which is more efficient with respect to performance?
Indexing a phrase as it is and searcing with the help of a TermQuery
OR
Storing only single words in index and making use of quoted search
phrases?
Regards,
Venkateshprasanna
If you index "A Phrase" as untokenized, you
How does PhraseQuery search for quoted phrases when the index does not store
these phrases as it is?
Is there any analyzer that indexes the phrases?
--
View this message in context:
http://www.nabble.com/How-does-PhraseQuery-search-for-quoted-phrases--tf2225757.html#a6167885
Sent from the Luce
I saw these classes and want to use them for my implementation as well. But I
am not getting the source code for the specified package:
org.apache.commons.collections
Is there any other way of implementing the same?
Why only classes from that package has to be used?
Regards,
Venkateshprasanna
I need to index bigrams and trigrams in a document. Here is an example:
Text:
This is a text document written by someone. Read this and post your comments
words that must be indexed:
text
document
written
someone
read
post
your
comments
text document
document written
post your
your comments
text