The Trie type can be tuned for range queries v.s. single queries. This seems to be explained in email and nowhere else:
http://www.lucidimagination.com/search/document/c501f59515a9eece On Mon, May 21, 2012 at 12:54 AM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote: > On Thu, 2012-05-17 at 23:03 +0200, Robert Bart wrote: >> I am running Lucene 3.6 in a system that indexes about 4 billion documents >> across several indexes, and I'm hoping to get documents in order of a >> certain NumericField. > > What is the maximum size on any single index, in terms of number of > documents? What is the type of the NumericField? > >> I've tried using Lucene's Sort implementation, but it looks like it tries >> to do the entire sort in memory by allocating a huge array with space for >> every document in the index. > > The FieldCache allocates an array of length #documents with the same > type that your NumericField is. The sort itself is of the sliding window > type, meaning that it only takes up memory lineary to the number of > documents wanted in the response. Do you require millions of documents > to be returned as part of a search? > > Sanity check: You do specify the type when performing a sorted search, > right? If not, the values will be treated as Strings. > >> On my index, this quickly runs out of memory. > > Assuming that your largest index is 1B documents and that your > NumericField is of type Integer, the FieldCache's values for the sort > should take up 1B * 4 = 4GB. Are you hoping for less? > >> Are there any alternatives or better ways of getting documents in order of >> a NumericField for a very large index? > > Be sure to select the type of NumericField to be as small as possible. > If you have few unique sort values (e.g. 17, 80, 2000 and 5678), you > might map them down (to 0, 1, 2 and 3 for this example) and store them > as a byte. > > Currently Lucene only supports atomic types for numerics in the > FieldCache, so the smallest one is byte. It is possible to use only > ceil(log2(#unique_values)) bits/document, although that requires a bit > of custom coding. > > Regards, > Toke Eskildsen > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Lance Norskog goks...@gmail.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org