Hi,

Thanks for your response. I am just adding **** inline with my answers.

How much work is it to parse the log files?
What kind if hardware are you using?
Are you accessing things over a network? Is there network latency involved?

******* I believe answers for the above questions will not affect lucene
indexing performance. I am starting with the assumption that I do have
key,value pairs of properties like Source,Severity,time and Message.

Can you index various parts in parallel?

****** Yes, But I need to test whether lucene allows us to index on the same
directory with multiple threads.

How many document are we talking about? You've given no data on whether you
expect 100 document or 1,000,000,000,000,000 documents?
How fast is the data being added to your syslogs?

***** its a sequential stream of data, I need to find the maximum possible
indexing rate for a given hardware. I am able to index 500 logs/sec with the
following configuration. I am running this java process alone in my machine.

Each log record size = 250 bytes
Number of fields = 14
1.8 GHz, 512 MB RAM, Mandrake Linux
MERGE_FACTOR=100
MAX_BUFFER_DOCS=250
And lucene takes 500 bytes to index/store each log record.

But I need to find the optimal indexing file size so that I can create new
indexing directory whenever I reach that optimal indexing file size.

Reason for my question is just to verify any body faced the same problem or
not. If yes, I wanted to get expert advice.

with regards,
MSK

On 1/29/07, Erick Erickson <[EMAIL PROTECTED]> wrote:

That depends (tm Erik Hatcher) <G>. The problem with such an open-ended
question is that there are so many unique variables that it's impossible
to
answer in any meaningful way. For instance....

How much work is it to parse the log files?
What kind if hardware are you using?
Are you accessing things over a network? Is there network latency
involved?
Can you index various parts in parallel?
How many document are we talking about? You've given no data on whether
you
expect 100 document or 1,000,000,000,000,000 documents?
How fast is the data being added to your syslogs?
and on and on and on.

All you can do is set up a test to see. It shouldn't be very hard to, say,
create a small program that randomizes input and push it through the
indexing process and measure.

Best
Erick

On 1/29/07, Saravana <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> Did anybody use lucene to index syslogs? What is the maximum indexing
rate
> that we can get to store a 200 bytes document with 14 fields?
>
> thanks,
> MSK
>
> --
> Every day brings us a sea of opportunities
>
>


Reply via email to