(there are around 6,000,000 posts on the message board database)

Date encoded as yyMMdd: appears to be using around 30M
Date encoded as yyMMddHHmmss:  appears to be using more than 400M!

I guess I would have understood if I was seeing the usage double for sure, or even a little more; no idea how you guys encode the indexes, if at all, but it's gone up over tenfold, which I can't explain.

Sort memory cost is based on the total # of unique terms for the given field (multiplied by the number of locale's involved if you have to do that too! but in temporal sorting you don't).

This is easier than you think, just use 2 fields (date, time) and sort by both. This means the Date field's unique term count grows only 1 term per day. The Time field can be set to minutes (if you can get away with that) meaning that you only have fairly insignificant total term count for the time field. We use this at Aconex, and have indexes with millions of records (weekly 'work' searcher refreshed every 5 seconds, archive searcher is held in memory, with a Multisearcher done over the 2) and it works a treat. We regularly need to return million+ results from a search (don't ask) using this sort of sorting and the overall search time is only a few seconds.

On a related note, work hard not to need to use Locale sensitive sorting if you can for any other fields, for large results the CPU penalty is horrific (even once you get past the synchronization bottleneck in the CollationKey stuff).

cheers,

Paul Smith

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to