Hi, I am experimenting with the Lucene trunk (aka 4.0), especially with the new IndexDocValues feature. I am trying to store some query-independent statistics such as PageRank, etc. One stat that I am trying to store is the sum of all the term frequencies in a document. This can be seen as the document length. Is there a way to pre-compute this sum while performing the indexing?
Thank you, h. > TermVectors are still available in Lucene trunk aka 4.0, we just changed the > implementation of them to fit the general Lucene Terms/Fields/… APIs. > TermVectors (if enabled in the document during indexing) are simply something > like a small index per document written to disk like a stored field (it has > nothing to do with DocValues, because you mentioned this). Theoretically, you > can execute a query against the small “TermVectors Index” and get exactly one > hit or no hit, if the query matches this document. This is e.g. used for > highlighting if TV are enabled. To support this “TV as a small index”, the > old API was removed and the new TermVectors API returns the same > Terms/TermsEnum/DocsEnum APIs like IndexReader for a complete index, but all > structures simply return one document (ID=0) and corresponding term > frequencies/doc frequencies. > > To have some example code how to use it, review the Lucene testcases, some > example: > > Terms result = > reader.getTermVectors(docId).terms(DocHelper.TEXT_FIELD_2_KEY); > assertNotNull(result); > assertEquals(3, result.getUniqueTermCount()); > TermsEnum termsEnum = result.iterator(null); > while(termsEnum.next() != null) { > String term = termsEnum.term().utf8ToString(); > int freq = (int) termsEnum.totalTermFreq(); > assertTrue(freq > 0); > } > > Fields results = reader.getTermVectors(docId); > assertTrue(results != null); > assertEquals("We do not have 3 term freq vectors", 3, > results.getUniqueFieldCount()); > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de >