date:20060618

Re: Lucene as syslog storage

2006-06-18 Thread Otis Gospodnetic

Andreas, Not a problem. As a matter of fact, when I first saw Splunk months ago, I thought to myself "Are they using Lucene for this?" I've had the same idea a looong time back, but of course that's just one of many ideas I didn't have time for. Otis - Original Message From: Andrea

Re: Lucene as syslog storage

2006-06-18 Thread Erick Erickson

there's somebody on the mailing list who's talking about indexing a Billion (with a "B") documents. I don't know how far they've gotten, but at least *somebody* has contemplated a huge archive ... If memory serves, s/he had indexed a significant number of documents, you might try searching for "bi

Re: Unique indexes?

2006-06-18 Thread Erick Erickson

Every document insertion generates a new doc id (doc.id()). It just bumps by one for each entry. But that ID changes upon re-indexing, and if you submit the same doc twice, it gets indexed twice and have to documents in your index that are indistinguishable except for Lucene's ID. Otherwise, you

Re: Lucene as syslog storage

2006-06-18 Thread Andreas Moroder

Ray Tsang wrote: > I think it ultimately depends on what you would like to do with the > stored data? Would you need more of full text searches on the log or > more of statistical anlaysis? > > ray, Hello Ray, possibly both, but the full text filtering is more important. Bye Andreas --

Re: Lucene as syslog storage

2006-06-18 Thread Ray Tsang

I think it ultimately depends on what you would like to do with the stored data? Would you need more of full text searches on the log or more of statistical anlaysis? ray, On 6/18/06, Andreas Moroder <[EMAIL PROTECTED]> wrote: Hello, I would like to write a application to browse around and se

Lucene as syslog storage

2006-06-18 Thread Andreas Moroder

Hello, I would like to write a application to browse around and search the log files of linux machines, like www.splunk.org does. Would lucene be the right db to store such text information ? Because the log info should be stored in the db continuously and not as batch, this would create many tho

Unique indexes?

2006-06-18 Thread Michael J. Prichard

Is there anything like a unique key for lucene indexes? For example, say I want to have unique ItemID's in my index...do I need to check for that before insert or can I lock it down with Lucene's API? - To unsubscribe, e-mail:

Re: indexing emails --> mutliple "to" emails, setting position same

2006-06-18 Thread Michael J. Prichard

So I have emails with multiple recipients (of course, this is very common). I currently put them all on the same string seperated by space and then tokenize them with Standard Analyzer. I was looking into SynonymAnalyzers and see that you can drop multiple tokens with the same position. Woul

Re: indexing emails

2006-06-18 Thread Michael J. Prichard

This is great! I was hoping to find some people who are dealing with this issue. I am going to try to tokenize the email addresses and see what that does. I am going to use a StandardAnalyzer which (if I am not mistaken) will keep the email address as is. Would I still have to use PrefixQue

Use Lucene for "Frequent Itemset" computation

2006-06-18 Thread PrasenjitM

Just curious, has anyone tried to use Lucene to do "frequent itemset" computation for auto-clustering. For a given pair of items/words we can use lucene and do a conjunctive search from those items. Only if we can figure out a way to minimize the number of possible itemsets. Thanks, Prasen -

Re: Lucene as syslog storage

Re: Lucene as syslog storage

Re: Unique indexes?

Re: Lucene as syslog storage

Re: Lucene as syslog storage

Lucene as syslog storage

Unique indexes?

Re: indexing emails --> mutliple "to" emails, setting position same

Re: indexing emails

Use Lucene for "Frequent Itemset" computation

10 matches

Site Navigation

Mail list logo

Footer information