Greg Shackles wrote:
I'm not sure offhand how to write the code to do it, but I know when you open an index in Luke, that is one of the numbers it gives you. If you want to just get the number once that would be an easy way to do it. If you want the code for it, Luke is open source so you could see how they do it. (I used Luke as a starting point at one point for seeing how to get a list of high frequency terms).
Luke currently uses the same method as you used, i.e. creates a TermEnum and traverses all terms. This is fast enough and doesn't require access to implementation details.
There is a faster way to do it, but it's not exposed through API. SegmentReader (a concrete impl. of IndexReader) opens a TermInfosReader, which has a field SegmentTermEnum:indexEnum, which in turn has a field "size", and this is the number of terms. Accessing this information this way would be messy - it's better to propose that this information should be added to API.
-- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org