Hi
Is it possible to get a list of terms within a document, and also TF of
each of these terms *in that document only*? (Lucene 5.3)
IndexReader has a method "Terms getTermVector(int docID, String field)",
which gives me a "Terms" object, on which I can get a TermsEnum. But I
do not know whe
Hi,
With the terms enum you can iterate over all terms. Each one returns its term
frequency. Of course, you need to enable term vectors during indexing. The
pattern how to use terms enum can be looked up at various places in Lucene
source code. It's a very expert API but it is the way to go her
Thanks but TermsEnum has two methods that returns frequency-related
info, both are corpus-level, not document specific:
-docFreq() Returns the number of documents containing the current term.
-totalTermFreq() Returns the total number of occurrences of this term
across all documents (the sum of
Hi,
For term vectors enum the doc freq is always 1 and the term freq is the one
from the document you got term vectors.
Term vectors just implement the same interface, but they can be seen as a small
index per document. This is made like that to allow executing queries for
highlighting on sing