Re: Lucene as syslog storage

2006-06-20 Thread Benjamin Stein
I've personally indexed over 1,000,000 documents and Lucene doesn't even breath hard. We are in the hundreds of millions and growing, and Lucene does tend to sweat a little bit, although it can certainly handle it. You're going to have to understand a bit of the internals of Lucene a bit more.

Re: Numbertools and efficient sorting

2006-06-10 Thread Benjamin Stein
On 6/9/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: : I have an integer field that I've indexed after converting to a string : using NumberTools.longToString(). : Now I want to sort my results using this field. Everything works when : treating the field as a string, but is very slow and memor

Numbertools and efficient sorting

2006-06-09 Thread Benjamin Stein
I have an integer field that I've indexed after converting to a string using NumberTools.longToString(). Now I want to sort my results using this field. Everything works when treating the field as a string, but is very slow and memory intensive. I want to use INT sorting instead, but these strin

Re: IndexWriter.addIndexes & optimization

2006-06-07 Thread Benjamin Stein
On 6/7/06, Benjamin Stein <[EMAIL PROTECTED]> wrote: During indexing, I have been using a RAMDirectory to store many thousands of documents in memory before flushing the buffer to disk using IndexWriter.addIndexes. For the most part this works very well, except that performance de

IndexWriter.addIndexes & optimization

2006-06-07 Thread Benjamin Stein
I have a very large corpus that I am storing in many indexes: 200 indexes * ~500MB each, with 10^6 very tiny documents in each. (I could look into optimizing this later, of course, but seems ok for now) During indexing, I have been using a RAMDirectory to store many thousands of documents in mem

RE: Removing search results that fall within a time range

2006-05-23 Thread Benjamin Stein
> -Original Message- > From: karl wettin [mailto:[EMAIL PROTECTED] > Sent: Tuesday, May 23, 2006 6:44 PM > To: java-user@lucene.apache.org > Subject: Re: Removing search results that fall within a time range > > On Tue, 2006-05-23 at 17:38 -0400, Benjamin Stei

Removing search results that fall within a time range

2006-05-23 Thread Benjamin Stein
I have a requirement to only return one result for all documents whose timestamps fall within N seconds of one another. (where timestamp is a field and N is an integer). For example, Document A is timestamped "12:00:00" and Document B has timestamp "12:00:30", Document B should be discarded. On t