Yes, StringIndex's public fields make life awkward. Re initialization - I did think you could try use arrays of byte arrays. First 256 terms can be addressed using just one byte array, on encountering a 257th term an extra byte array is allocated. References to terms then require indexing into 2 byte arrays and bit shifting the 2nd byte to produce a comibined short which can address up to 65k terms held in a term pool.
When sorting, a fast comparison of 2 values can avoid always indexing into all byte arrays and shifting to produce a number. Simply comparing entries from the most significant byte array first can reveal a difference in order, if equal then comparing bytes from the next most significant byte array is required and so on. Not sure how this would perform compared to simply upgrading whole byte arrays to shorts to ints as you go. Cheers, Mark On 15 Oct 2008, at 00:56, Chris Hostetter <[EMAIL PROTECTED]> wrote: : Actually looking at this a little deeper maybe Lucene could/should : automatically be doing this "short" optimisation here? At the moment it can't, the array's in StringIndex are public. The other thing that would be a bit tricky is the initialization ... i can't think of any easy way to know in advance how many terms there are before iterating over all the terms, so you'd have to assume one and then if you're wrong copy to the other -- not sure how expensive thta copy would be. It's a little more feasible for custom clients to do when they know in advance how many terms they've got -- but some of the existing FieldCacheImpl code could probably be refactoredto make it easier on people. -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]