Re: Memory Usage

Paul Smith Thu, 03 Jul 2008 16:03:29 -0700

(there are around 6,000,000 posts on the message board database)

Date encoded as yyMMdd: appears to be using around 30M
Date encoded as yyMMddHHmmss:  appears to be using more than 400M!
I guess I would have understood if I was seeing the usage double forsure, or even a little more; no idea how you guys encode theindexes, if at all, but it's gone up over tenfold, which I can'texplain.

Sort memory cost is based on the total # of unique terms for the givenfield (multiplied by the number of locale's involved if you have to dothat too! but in temporal sorting you don't).

This is easier than you think, just use 2 fields (date, time) and sortby both. This means the Date field's unique term count grows only 1term per day. The Time field can be set to minutes (if you can getaway with that) meaning that you only have fairly insignificant totalterm count for the time field. We use this at Aconex, and haveindexes with millions of records (weekly 'work' searcher refreshedevery 5 seconds, archive searcher is held in memory, with aMultisearcher done over the 2) and it works a treat. We regularlyneed to return million+ results from a search (don't ask) using thissort of sorting and the overall search time is only a few seconds.

On a related note, work hard not to need to use Locale sensitivesorting if you can for any other fields, for large results the CPUpenalty is horrific (even once you get past the synchronizationbottleneck in the CollationKey stuff).


cheers,

Paul Smith

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Memory Usage

Reply via email to