Re: Lucene as syslog storage

2006-06-18 Thread Otis Gospodnetic
Andreas, Not a problem. As a matter of fact, when I first saw Splunk months ago, I thought to myself "Are they using Lucene for this?" I've had the same idea a looong time back, but of course that's just one of many ideas I didn't have time for. Otis - Original Message From: Andrea

Re: Lucene as syslog storage

2006-06-18 Thread Erick Erickson
there's somebody on the mailing list who's talking about indexing a Billion (with a "B") documents. I don't know how far they've gotten, but at least *somebody* has contemplated a huge archive ... If memory serves, s/he had indexed a significant number of documents, you might try searching for "bi

Re: Unique indexes?

2006-06-18 Thread Erick Erickson
Every document insertion generates a new doc id (doc.id()). It just bumps by one for each entry. But that ID changes upon re-indexing, and if you submit the same doc twice, it gets indexed twice and have to documents in your index that are indistinguishable except for Lucene's ID. Otherwise, you

Re: Lucene as syslog storage

2006-06-18 Thread Andreas Moroder
Ray Tsang wrote: > I think it ultimately depends on what you would like to do with the > stored data? Would you need more of full text searches on the log or > more of statistical anlaysis? > > ray, Hello Ray, possibly both, but the full text filtering is more important. Bye Andreas --

Re: Lucene as syslog storage

2006-06-18 Thread Ray Tsang
I think it ultimately depends on what you would like to do with the stored data? Would you need more of full text searches on the log or more of statistical anlaysis? ray, On 6/18/06, Andreas Moroder <[EMAIL PROTECTED]> wrote: Hello, I would like to write a application to browse around and se

Lucene as syslog storage

2006-06-18 Thread Andreas Moroder
Hello, I would like to write a application to browse around and search the log files of linux machines, like www.splunk.org does. Would lucene be the right db to store such text information ? Because the log info should be stored in the db continuously and not as batch, this would create many tho

Unique indexes?

2006-06-18 Thread Michael J. Prichard
Is there anything like a unique key for lucene indexes? For example, say I want to have unique ItemID's in my index...do I need to check for that before insert or can I lock it down with Lucene's API? - To unsubscribe, e-mail:

Re: indexing emails --> mutliple "to" emails, setting position same

2006-06-18 Thread Michael J. Prichard
So I have emails with multiple recipients (of course, this is very common). I currently put them all on the same string seperated by space and then tokenize them with Standard Analyzer. I was looking into SynonymAnalyzers and see that you can drop multiple tokens with the same position. Woul

Re: indexing emails

2006-06-18 Thread Michael J. Prichard
This is great! I was hoping to find some people who are dealing with this issue. I am going to try to tokenize the email addresses and see what that does. I am going to use a StandardAnalyzer which (if I am not mistaken) will keep the email address as is. Would I still have to use PrefixQue

Use Lucene for "Frequent Itemset" computation

2006-06-18 Thread PrasenjitM
Just curious, has anyone tried to use Lucene to do "frequent itemset" computation for auto-clustering. For a given pair of items/words we can use lucene and do a conjunctive search from those items. Only if we can figure out a way to minimize the number of possible itemsets. Thanks, Prasen -