Paul Elschot wrote:

Op Tuesday 11 November 2008 21:55:45 schreef Michael McCandless:
Also, one nice optimization we could do with the "term number column-
stride array" is do bit packing (borrowing from the PFOR code)
dynamically.

Ie since we know there are X unique terms in this segment, when
populating the array that maps docID to term number we could use
exactly the right number of bits.  Enumerated fields with not many
unique values (eg, country, state) would take relatively little RAM.
With LUCENE-1231, where the fields are stored column stride on disk,
we could do this packing during index such that loading at search
time is very fast.

Perhaps we'd better continue this at LUCENE-1231 or LUCENE-1410.
I think what you're referring to is PDICT, which has frame exceptions
for values that occur infrequently.

Yes let's move the discussion to Jira.

Actually I was referring to simple bit-packing.

For encoding array of compact enum terms (eg city, state, color, zip) I'm guessing the exceptions logic won't buy us much and would hurt seeking needed for column-stride fields. But we should certainly test it.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to