Re: Determining index term count

Andrzej Bialecki Wed, 07 Jan 2009 12:43:51 -0800

Greg Shackles wrote:

I'm not sure offhand how to write the code to do it, but I know when you
open an index in Luke, that is one of the numbers it gives you.  If you want
to just get the number once that would be an easy way to do it.  If you want
the code for it, Luke is open source so you could see how they do it.  (I
used Luke as a starting point at one point for seeing how to get a list of
high frequency terms).

Luke currently uses the same method as you used, i.e. creates a TermEnumand traverses all terms. This is fast enough and doesn't require accessto implementation details.

There is a faster way to do it, but it's not exposed through API.SegmentReader (a concrete impl. of IndexReader) opens a TermInfosReader,which has a field SegmentTermEnum:indexEnum, which in turn has a field"size", and this is the number of terms. Accessing this information thisway would be messy - it's better to propose that this information shouldbe added to API.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Determining index term count

Reply via email to