Re: Lucene as syslog storage

2006-06-20 Thread Benjamin Stein
I've personally indexed over 1,000,000 documents and Lucene doesn't even breath hard. We are in the hundreds of millions and growing, and Lucene does tend to sweat a little bit, although it can certainly handle it. You're going to have to understand a bit of the internals of Lucene a bit more.

Re: Lucene as syslog storage

2006-06-18 Thread Otis Gospodnetic
riginal Message From: Andreas Moroder <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Sunday, June 18, 2006 12:58:16 PM Subject: Lucene as syslog storage Hello, I would like to write a application to browse around and search the log files of linux machines, like www.splunk.org does

Re: Lucene as syslog storage

2006-06-18 Thread Erick Erickson
there's somebody on the mailing list who's talking about indexing a Billion (with a "B") documents. I don't know how far they've gotten, but at least *somebody* has contemplated a huge archive ... If memory serves, s/he had indexed a significant number of documents, you might try searching for "bi

Re: Lucene as syslog storage

2006-06-18 Thread Andreas Moroder
Ray Tsang wrote: > I think it ultimately depends on what you would like to do with the > stored data? Would you need more of full text searches on the log or > more of statistical anlaysis? > > ray, Hello Ray, possibly both, but the full text filtering is more important. Bye Andreas --

Re: Lucene as syslog storage

2006-06-18 Thread Ray Tsang
I think it ultimately depends on what you would like to do with the stored data? Would you need more of full text searches on the log or more of statistical anlaysis? ray, On 6/18/06, Andreas Moroder <[EMAIL PROTECTED]> wrote: Hello, I would like to write a application to browse around and se

Lucene as syslog storage

2006-06-18 Thread Andreas Moroder
Hello, I would like to write a application to browse around and search the log files of linux machines, like www.splunk.org does. Would lucene be the right db to store such text information ? Because the log info should be stored in the db continuously and not as batch, this would create many tho