On Thu, 2012-05-17 at 23:03 +0200, Robert Bart wrote: > I am running Lucene 3.6 in a system that indexes about 4 billion documents > across several indexes, and I'm hoping to get documents in order of a > certain NumericField.
What is the maximum size on any single index, in terms of number of documents? What is the type of the NumericField? > I've tried using Lucene's Sort implementation, but it looks like it tries > to do the entire sort in memory by allocating a huge array with space for > every document in the index. The FieldCache allocates an array of length #documents with the same type that your NumericField is. The sort itself is of the sliding window type, meaning that it only takes up memory lineary to the number of documents wanted in the response. Do you require millions of documents to be returned as part of a search? Sanity check: You do specify the type when performing a sorted search, right? If not, the values will be treated as Strings. > On my index, this quickly runs out of memory. Assuming that your largest index is 1B documents and that your NumericField is of type Integer, the FieldCache's values for the sort should take up 1B * 4 = 4GB. Are you hoping for less? > Are there any alternatives or better ways of getting documents in order of > a NumericField for a very large index? Be sure to select the type of NumericField to be as small as possible. If you have few unique sort values (e.g. 17, 80, 2000 and 5678), you might map them down (to 0, 1, 2 and 3 for this example) and store them as a byte. Currently Lucene only supports atomic types for numerics in the FieldCache, so the smallest one is byte. It is possible to use only ceil(log2(#unique_values)) bits/document, although that requires a bit of custom coding. Regards, Toke Eskildsen --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org