Hi Michael, Thanks for the explanation. I am working with a TREC dataset, since it is static, I set size of that array experimentally.
I followed the DefaultSimilarity#lengthNorm method a bit. If default similarity and no index time boost is used, I assume that norm equals to 1.0 / Math.sqrt(numTerms). First option is somehow obtain pre-computed norm value and apply reverse operation to obtain numTerms. numTerms = (1/norm)^2 This will be an approximation because norms are stored in a byte. How do I access that norm value for a given docid and a field? Second option, I store numTerms as a separate field, like any other organic fields. Do I need to calculate it by myself? Or can I access above already computed numTerms value during indexing? I think I will follow second option. Is there a pointer where reading/writing a DocValues based field example is demostrated? Thanks, Ahmet On Friday, February 6, 2015 11:08 AM, Michael McCandless <luc...@mikemccandless.com> wrote: How will you know how large to allocate that array? The within-doc term freq can in general be arbitrarily large... Lucene does not directly store the total number of terms in a document, but it does store it approximately in the doc's norm value. Maybe you can use that? Alternatively, you can store this statistic yourself, e.g as a doc value. Mike McCandless http://blog.mikemccandless.com On Thu, Feb 5, 2015 at 7:24 PM, Ahmet Arslan <iori...@yahoo.com.invalid> wrote: > Hello Lucene Users, > > I am traversing all documents that contains a given term with following code : > > Term term = new Term(field, word); > Bits bits = MultiFields.getLiveDocs(reader); > DocsEnum docsEnum = MultiFields.getTermDocsEnum(reader, bits, field, > term.bytes()); > > while (docsEnum.nextDoc() != DocsEnum.NO_MORE_DOCS) { > > array[docsEnum.freq()]++; > > // how to retrieve term count for this document? > xxxxx(docsEnum.docID(), field); > > > } > > How can I get field term count values for these documents using Lucene 4.10.3? > > Is above code OK for traversing posting list of term? > > Thanks, > Ahmet > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org