get frequency of each term from a document

2015-09-20 Thread Ziqi Zhang
Hi Is it possible to get a list of terms within a document, and also TF of each of these terms *in that document only*? (Lucene 5.3) IndexReader has a method "Terms getTermVector(int docID, String field)", which gives me a "Terms" object, on which I can get a TermsEnum. But I do not know whe

Re: get frequency of each term from a document

2015-09-20 Thread Uwe Schindler
Hi, With the terms enum you can iterate over all terms. Each one returns its term frequency. Of course, you need to enable term vectors during indexing. The pattern how to use terms enum can be looked up at various places in Lucene source code. It's a very expert API but it is the way to go her

Re: get frequency of each term from a document

2015-09-20 Thread Ziqi Zhang
Thanks but TermsEnum has two methods that returns frequency-related info, both are corpus-level, not document specific: -docFreq() Returns the number of documents containing the current term. -totalTermFreq() Returns the total number of occurrences of this term across all documents (the sum of

Re: get frequency of each term from a document

2015-09-20 Thread Uwe Schindler
Hi, For term vectors enum the doc freq is always 1 and the term freq is the one from the document you got term vectors. Term vectors just implement the same interface, but they can be seen as a small index per document. This is made like that to allow executing queries for highlighting on sing