Yeah that's customizing the Lucene source. :)  I should have gone into
more detail, I will next time.

On Wed, Nov 10, 2010 at 2:10 PM, Michael McCandless
<luc...@mikemccandless.com> wrote:
> Actually, the .tii file pre-flex (3.x) is nearly identical to the .tis
> file, just that it only contains every 128th term.
>
> If you just make SegmentTermEnum public (or, sneak your class into
> oal.index package) then you can instantiate SegmentTermsEnum passing
> it an IndexInput opened on the .tii file.
>
> Then you can enum the terms directly...
>
> Mike
>
> On Wed, Nov 10, 2010 at 4:02 PM, Burton-West, Tom <tburt...@umich.edu> wrote:
>> Hello all,
>>
>> We have an extremely large number of terms in our indexes.  I want to be 
>> able to extract a sample of the terms, say something like every 128th term.  
>>  If I use code based on org.apache.lucene.misc.HighFreqTerms or 
>> org.apache.lucene.index.CheckIndex I would get a TermsEnum, call 
>> termEnum.next() 128 times, grab the term and then call next another 128 
>> times.
>> termEnum = reader.terms();
>> while (termEnum.next()
>> { }
>>
>> Since the tii file contains every 128th (or IndexInterval ) term and it is 
>> loaded into memory, is there some programmatic way (in the public API) to 
>> read that data structure in memory rather than having to force Lucene to 
>> actually read the entire tis file by using termEnum.next() ?
>>
>>
>> Tom Burton-West
>> http://www.hathitrust.org/blogs/large-scale-search
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to