I've personally indexed over 1,000,000 documents and Lucene doesn't even
breath hard.
We are in the hundreds of millions and growing, and Lucene does tend
to sweat a little bit, although it can certainly handle it.
You're going to have to understand a bit of the internals of Lucene a
bit more.
On 6/9/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
: I have an integer field that I've indexed after converting to a string
: using NumberTools.longToString().
: Now I want to sort my results using this field. Everything works when
: treating the field as a string, but is very slow and memor
I have an integer field that I've indexed after converting to a string
using NumberTools.longToString().
Now I want to sort my results using this field. Everything works when
treating the field as a string, but is very slow and memory intensive.
I want to use INT sorting instead, but these strin
On 6/7/06, Benjamin Stein <[EMAIL PROTECTED]> wrote:
During indexing, I have been using a RAMDirectory to store many thousands
of documents in memory before flushing the buffer to disk using
IndexWriter.addIndexes.
For the most part this works very well, except that performance de
I have a very large corpus that I am storing in many indexes: 200 indexes
* ~500MB each, with 10^6 very tiny documents in each. (I could look into
optimizing this later, of course, but seems ok for now)
During indexing, I have been using a RAMDirectory to store many thousands of
documents in mem
> -Original Message-
> From: karl wettin [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, May 23, 2006 6:44 PM
> To: java-user@lucene.apache.org
> Subject: Re: Removing search results that fall within a time range
>
> On Tue, 2006-05-23 at 17:38 -0400, Benjamin Stei
I have a requirement to only return one result for all documents whose
timestamps fall within N seconds of one another. (where timestamp is a
field and N is an integer).
For example, Document A is timestamped "12:00:00" and Document B has
timestamp "12:00:30", Document B should be discarded. On t