Ahh I see. Term vectors are actually an inverted index for a single document, and they also have the same postings API as the whole index (including TermsEnum.totalTermFreq), but that method likely always returns -1 for term vectors because it's not implemented? Maybe Lucene's default codec should be improved to store this; maybe open an issue?
In the meantime you could make your own codec that does store it. Mike McCandless http://blog.mikemccandless.com On Tue, Apr 18, 2017 at 9:12 AM, Manjula Wijewickrema <manjul...@gmail.com> wrote: > Hi Mike, > > Thanks for the answer. I think this returns the total number of > occurrences of a specified term across all the documents in the corpus > right? > > But I need the total number of terms (including multiple occurrences of > the same term) in each document of the corpus. Any suggestion? > > Thanks! > > On Tue, Apr 18, 2017 at 2:53 PM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> I think you want to use the TermsEnum.totalTermFreq method? >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> On Sun, Apr 16, 2017 at 11:36 AM, Manjula Wijewickrema < >> manjul...@gmail.com> wrote: >> >>> Hi, >>> >>> Is there any way to get the total count of terms in the Term Frequency >>> Vector (tvf)? I need to calculate the Normalized term frequency of each >>> term in my tvf. I know how to obtain the length of the tvf, but it >>> doesn't >>> work since I need to count duplicate occurrences as well. >>> >>> Highly appreciate your kind response. >>> >> >> >