Andreas,
Not a problem. As a matter of fact, when I first saw Splunk months ago, I
thought to myself "Are they using Lucene for this?" I've had the same idea a
looong time back, but of course that's just one of many ideas I didn't have
time for.
Otis
- Original Message
From: Andrea
there's somebody on the mailing list who's talking about indexing a Billion
(with a "B") documents. I don't know how far they've gotten, but at least
*somebody* has contemplated a huge archive ... If memory serves, s/he had
indexed a significant number of documents, you might try searching for
"bi
Every document insertion generates a new doc id (doc.id()). It just bumps by
one for each entry. But that ID changes upon re-indexing, and if you submit
the same doc twice, it gets indexed twice and have to documents in your
index that are indistinguishable except for Lucene's ID.
Otherwise, you
Ray Tsang wrote:
> I think it ultimately depends on what you would like to do with the
> stored data? Would you need more of full text searches on the log or
> more of statistical anlaysis?
>
> ray,
Hello Ray,
possibly both, but the full text filtering is more important.
Bye
Andreas
--
I think it ultimately depends on what you would like to do with the
stored data? Would you need more of full text searches on the log or
more of statistical anlaysis?
ray,
On 6/18/06, Andreas Moroder <[EMAIL PROTECTED]> wrote:
Hello,
I would like to write a application to browse around and se
Hello,
I would like to write a application to browse around and search the log
files of linux machines, like www.splunk.org does.
Would lucene be the right db to store such text information ?
Because the log info should be stored in the db continuously and not as
batch, this would create many tho
Is there anything like a unique key for lucene indexes? For example,
say I want to have unique ItemID's in my index...do I need to check for
that before insert or can I lock it down with Lucene's API?
-
To unsubscribe, e-mail:
So I have emails with multiple recipients (of course, this is very
common). I currently put them all on the same string seperated by space
and then tokenize them with Standard Analyzer. I was looking into
SynonymAnalyzers and see that you can drop multiple tokens with the same
position. Woul
This is great! I was hoping to find some people who are dealing with
this issue. I am going to try to tokenize the email addresses and see
what that does. I am going to use a StandardAnalyzer which (if I am not
mistaken) will keep the email address as is. Would I still have to use
PrefixQue
Just curious, has anyone tried to use Lucene to do "frequent itemset"
computation for auto-clustering. For a given pair of items/words we can use
lucene and do a conjunctive search from those items. Only if we can figure out
a way to minimize the number of possible itemsets.
Thanks,
Prasen
-
10 matches
Mail list logo