(there are around 6,000,000 posts on the message board database)
Date encoded as yyMMdd: appears to be using around 30M
Date encoded as yyMMddHHmmss: appears to be using more than 400M!
I guess I would have understood if I was seeing the usage double for
sure, or even a little more; no idea how you guys encode the
indexes, if at all, but it's gone up over tenfold, which I can't
explain.
Sort memory cost is based on the total # of unique terms for the given
field (multiplied by the number of locale's involved if you have to do
that too! but in temporal sorting you don't).
This is easier than you think, just use 2 fields (date, time) and sort
by both. This means the Date field's unique term count grows only 1
term per day. The Time field can be set to minutes (if you can get
away with that) meaning that you only have fairly insignificant total
term count for the time field. We use this at Aconex, and have
indexes with millions of records (weekly 'work' searcher refreshed
every 5 seconds, archive searcher is held in memory, with a
Multisearcher done over the 2) and it works a treat. We regularly
need to return million+ results from a search (don't ask) using this
sort of sorting and the overall search time is only a few seconds.
On a related note, work hard not to need to use Locale sensitive
sorting if you can for any other fields, for large results the CPU
penalty is horrific (even once you get past the synchronization
bottleneck in the CollationKey stuff).
cheers,
Paul Smith
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]