Thanks very much for this; I'll give it a shot.
Keith.
On 4 Jul 2008, at 00:02, Paul Smith wrote:
(there are around 6,000,000 posts on the message board database)
Date encoded as yyMMdd: appears to be using around 30M
Date encoded as yyMMddHHmmss: appears to be using more than 400M!
I guess I would have understood if I was seeing the usage double
for sure, or even a little more; no idea how you guys encode the
indexes, if at all, but it's gone up over tenfold, which I can't
explain.
Sort memory cost is based on the total # of unique terms for the
given field (multiplied by the number of locale's involved if you
have to do that too! but in temporal sorting you don't).
This is easier than you think, just use 2 fields (date, time) and
sort by both. This means the Date field's unique term count grows
only 1 term per day. The Time field can be set to minutes (if you
can get away with that) meaning that you only have fairly
insignificant total term count for the time field. We use this at
Aconex, and have indexes with millions of records (weekly 'work'
searcher refreshed every 5 seconds, archive searcher is held in
memory, with a Multisearcher done over the 2) and it works a treat.
We regularly need to return million+ results from a search (don't
ask) using this sort of sorting and the overall search time is only
a few seconds.
On a related note, work hard not to need to use Locale sensitive
sorting if you can for any other fields, for large results the CPU
penalty is horrific (even once you get past the synchronization
bottleneck in the CollationKey stuff).
cheers,
Paul Smith
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]