Sort runs out of memory

2012-05-17 Thread Robert Bart
Hi all, I am running Lucene 3.6 in a system that indexes about 4 billion documents across several indexes, and I'm hoping to get documents in order of a certain NumericField. I've tried using Lucene's Sort implementation, but it looks like it tries to do the entire sort in memory by allocating a

Re: Retrieving large numbers of documents from several disks in parallel

2011-12-27 Thread Robert Bart
of view, it would seem like the order in which > the documents are read is very significant for the reading speed (feel the > random access jump as being the issue). > >> > >> You could: > >> - move to ram-disk or ssd to make a difference? > >>

Retrieving large numbers of documents from several disks in parallel

2011-12-21 Thread Robert Bart
Hi All, I am running Lucene 3.4 in an application that indexes about 1 billion factual assertions (Documents) from the web over four separate disks, so that each disk has a separate index of about 250 million documents. The Documents are relatively small, less than 1KB each. These indexes provide