Yeah that's customizing the Lucene source. :) I should have gone into
more detail, I will next time.
On Wed, Nov 10, 2010 at 2:10 PM, Michael McCandless
wrote:
> Actually, the .tii file pre-flex (3.x) is nearly identical to the .tis
> file, just that it only contains every 128th term.
>
> If you
Actually, the .tii file pre-flex (3.x) is nearly identical to the .tis
file, just that it only contains every 128th term.
If you just make SegmentTermEnum public (or, sneak your class into
oal.index package) then you can instantiate SegmentTermsEnum passing
it an IndexInput opened on the .tii file
In a word, no. You'd need to customize the Lucene source to accomplish this.
On Wed, Nov 10, 2010 at 1:02 PM, Burton-West, Tom wrote:
> Hello all,
>
> We have an extremely large number of terms in our indexes. I want to be able
> to extract a sample of the terms, say something like every 128th
Hello all,
We have an extremely large number of terms in our indexes. I want to be able
to extract a sample of the terms, say something like every 128th term. If I
use code based on org.apache.lucene.misc.HighFreqTerms or
org.apache.lucene.index.CheckIndex I would get a TermsEnum, call
term