Re: IndexDocValues and storing Stats

Hany Azzam Wed, 04 Jan 2012 04:15:49 -0800

Hi,

I am experimenting with the Lucene trunk (aka 4.0), especially with the new 
IndexDocValues feature. I am trying to store some query-independent statistics 
such as PageRank, etc. One stat that I am trying to store is the sum of all the 
term frequencies in a document. This can be seen as the document length. Is 
there a way to pre-compute this sum while performing the indexing?


Thank you,
h.



> TermVectors are still available in Lucene trunk aka 4.0, we just changed the 
> implementation of them to fit the general Lucene Terms/Fields/… APIs. 
> TermVectors (if enabled in the document during indexing) are simply something 
> like a small index per document written to disk like a stored field (it has 
> nothing to do with DocValues, because you mentioned this). Theoretically, you 
> can execute a query against the small “TermVectors Index” and get exactly one 
> hit or no hit, if the query matches this document. This is e.g. used for 
> highlighting if TV are enabled. To support this “TV as a small index”, the 
> old API was removed and the new TermVectors API returns the same 
> Terms/TermsEnum/DocsEnum APIs like IndexReader for a complete index, but all 
> structures simply return one document (ID=0) and corresponding term 
> frequencies/doc frequencies.
>  
> To have some example code how to use it, review the Lucene testcases, some 
> example:
>  
>     Terms result = 
> reader.getTermVectors(docId).terms(DocHelper.TEXT_FIELD_2_KEY);
>     assertNotNull(result);
>     assertEquals(3, result.getUniqueTermCount());
>     TermsEnum termsEnum = result.iterator(null);
>     while(termsEnum.next() != null) {
>       String term = termsEnum.term().utf8ToString();
>       int freq = (int) termsEnum.totalTermFreq();
>       assertTrue(freq > 0);
>     }
>  
>     Fields results = reader.getTermVectors(docId);
>     assertTrue(results != null);
>     assertEquals("We do not have 3 term freq vectors", 3, 
> results.getUniqueFieldCount());     
>  
> Uwe
>  
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>

Re: IndexDocValues and storing Stats

Reply via email to