Yeah that's customizing the Lucene source. :) I should have gone into more detail, I will next time.
On Wed, Nov 10, 2010 at 2:10 PM, Michael McCandless <luc...@mikemccandless.com> wrote: > Actually, the .tii file pre-flex (3.x) is nearly identical to the .tis > file, just that it only contains every 128th term. > > If you just make SegmentTermEnum public (or, sneak your class into > oal.index package) then you can instantiate SegmentTermsEnum passing > it an IndexInput opened on the .tii file. > > Then you can enum the terms directly... > > Mike > > On Wed, Nov 10, 2010 at 4:02 PM, Burton-West, Tom <tburt...@umich.edu> wrote: >> Hello all, >> >> We have an extremely large number of terms in our indexes. I want to be >> able to extract a sample of the terms, say something like every 128th term. >> If I use code based on org.apache.lucene.misc.HighFreqTerms or >> org.apache.lucene.index.CheckIndex I would get a TermsEnum, call >> termEnum.next() 128 times, grab the term and then call next another 128 >> times. >> termEnum = reader.terms(); >> while (termEnum.next() >> { } >> >> Since the tii file contains every 128th (or IndexInterval ) term and it is >> loaded into memory, is there some programmatic way (in the public API) to >> read that data structure in memory rather than having to force Lucene to >> actually read the entire tis file by using termEnum.next() ? >> >> >> Tom Burton-West >> http://www.hathitrust.org/blogs/large-scale-search >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org