Hi, Thanks for your response. I am just adding **** inline with my answers.
How much work is it to parse the log files? What kind if hardware are you using? Are you accessing things over a network? Is there network latency involved? ******* I believe answers for the above questions will not affect lucene indexing performance. I am starting with the assumption that I do have key,value pairs of properties like Source,Severity,time and Message. Can you index various parts in parallel? ****** Yes, But I need to test whether lucene allows us to index on the same directory with multiple threads. How many document are we talking about? You've given no data on whether you expect 100 document or 1,000,000,000,000,000 documents? How fast is the data being added to your syslogs? ***** its a sequential stream of data, I need to find the maximum possible indexing rate for a given hardware. I am able to index 500 logs/sec with the following configuration. I am running this java process alone in my machine. Each log record size = 250 bytes Number of fields = 14 1.8 GHz, 512 MB RAM, Mandrake Linux MERGE_FACTOR=100 MAX_BUFFER_DOCS=250 And lucene takes 500 bytes to index/store each log record. But I need to find the optimal indexing file size so that I can create new indexing directory whenever I reach that optimal indexing file size. Reason for my question is just to verify any body faced the same problem or not. If yes, I wanted to get expert advice. with regards, MSK On 1/29/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
That depends (tm Erik Hatcher) <G>. The problem with such an open-ended question is that there are so many unique variables that it's impossible to answer in any meaningful way. For instance.... How much work is it to parse the log files? What kind if hardware are you using? Are you accessing things over a network? Is there network latency involved? Can you index various parts in parallel? How many document are we talking about? You've given no data on whether you expect 100 document or 1,000,000,000,000,000 documents? How fast is the data being added to your syslogs? and on and on and on. All you can do is set up a test to see. It shouldn't be very hard to, say, create a small program that randomizes input and push it through the indexing process and measure. Best Erick On 1/29/07, Saravana <[EMAIL PROTECTED]> wrote: > > Hi, > > Did anybody use lucene to index syslogs? What is the maximum indexing rate > that we can get to store a 200 bytes document with 14 fields? > > thanks, > MSK > > -- > Every day brings us a sea of opportunities > >