Greg Shackles wrote:
I'm not sure offhand how to write the code to do it, but I know when you
open an index in Luke, that is one of the numbers it gives you.  If you want
to just get the number once that would be an easy way to do it.  If you want
the code for it, Luke is open source so you could see how they do it.  (I
used Luke as a starting point at one point for seeing how to get a list of
high frequency terms).

Luke currently uses the same method as you used, i.e. creates a TermEnum and traverses all terms. This is fast enough and doesn't require access to implementation details.

There is a faster way to do it, but it's not exposed through API. SegmentReader (a concrete impl. of IndexReader) opens a TermInfosReader, which has a field SegmentTermEnum:indexEnum, which in turn has a field "size", and this is the number of terms. Accessing this information this way would be messy - it's better to propose that this information should be added to API.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to