Yes, StringIndex's public fields make life awkward.  Re initialization - I did 
think you could try use arrays of byte arrays. First 256 terms can be addressed 
using just one byte array, on encountering a 257th term an extra byte array is 
allocated. References to terms then require indexing into 2 byte arrays and bit 
shifting the 2nd byte to produce a comibined short which can address up to 65k 
terms held in a term pool. 

When sorting, a fast comparison of 2 values can avoid always indexing into all 
byte arrays and shifting to produce a number. Simply comparing entries from the 
most significant byte array first can reveal a difference in order, if equal 
then comparing  bytes from the next most significant byte array is required and 
so on. 

Not sure how this would perform compared to simply upgrading whole byte arrays 
to shorts to ints as you go. 

Cheers,
Mark

On 15 Oct 2008, at 00:56, Chris Hostetter <[EMAIL PROTECTED]> wrote:



: Actually looking at this a little deeper maybe Lucene could/should 
: automatically be doing this "short" optimisation here?

At the moment it can't, the array's in StringIndex are public.

The other thing that would be a bit tricky is the initialization ... i 
can't think of any easy way to know in advance how many terms there are 
before iterating over all the terms, so you'd have to assume one and then 
if you're wrong copy to the other -- not sure how expensive thta copy 
would be.

It's a little more feasible for custom clients to do when they know in 
advance how many terms they've got -- but some of the existing 
FieldCacheImpl code could probably be refactoredto make it easier on 
people.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to