I think this is doable.  Lucene compares how similar query terms are to
terms in indexed documents.  Flip that around, and you are close to a
solution.  Perhaps you'll want to process incoming documents to keep
only the top N most-important terms or phrases.  You could then create
a query out of those terms and run it against indexed queries. 
There'll be some tricks, such as dealing with situations where your
query contains a phrase, and you have to prevent individual terms in
incoming documents to matching individual words of that phrase...

Tell us how this goes, I'm interested in this, too.

Otis



--- Robert Watkins <[EMAIL PROTECTED]> wrote:
> We are considering Lucene as a replacement for Verity K2 (I won't go
> into the myriad reasons, other than to highlight that the K2 Java API
> is riddled with errors, falsities and just plain stupidity [okay,
> I've had my rant]) and figure that Lucene can do what we are
> currently using K2 for, except for what Verity calls "profiling"
> (matching incoming documents against a stored index of queries).
> 
> The very question I need answered was asked on 31 October, 2001 in
> a message with the subject "Alert function" (message ID 116006, I do
> believe), but there was never an answer. I will, in fact, quote this
> message, as it states the question very well indeed:
> 
> > One thing I need is an alert function, that is, instead of
> searching
> > with a query on a lot of documents, I want all incoming documents
> to be
> > searched against a bunch of stored search queries (called
> "agents").
> >
> > The simple solution is to index the document and then iterate
> through
> > all agents, but that isn't very scalable (say there are 10 new
> documents
> > a minute, and 10000 stored agents...).
> >
> > The way commercial products like Verity and Autonomy handles this
> is to
> > store the search queries (the agents) instead, and then use the
> incoming
> > document as a search query against the stored agents. This way it
> > becomes extremely scalable.
> >
> > Could this be a solution with Lucene too, and how would that look
> like?
> 
> This was originally posted by a Christian Ubbesen. I have tried to
> contact
> him, to see if he had managed to solve this, but have not heard back
> (hey,
> he could well have moved on in 3.5 years!)
> 
> Thanks,
> -- Robert
> 
> --------------------
> Robert Watkins
> [EMAIL PROTECTED]
> --------------------
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to