I think this is doable. Lucene compares how similar query terms are to terms in indexed documents. Flip that around, and you are close to a solution. Perhaps you'll want to process incoming documents to keep only the top N most-important terms or phrases. You could then create a query out of those terms and run it against indexed queries. There'll be some tricks, such as dealing with situations where your query contains a phrase, and you have to prevent individual terms in incoming documents to matching individual words of that phrase...
Tell us how this goes, I'm interested in this, too. Otis --- Robert Watkins <[EMAIL PROTECTED]> wrote: > We are considering Lucene as a replacement for Verity K2 (I won't go > into the myriad reasons, other than to highlight that the K2 Java API > is riddled with errors, falsities and just plain stupidity [okay, > I've had my rant]) and figure that Lucene can do what we are > currently using K2 for, except for what Verity calls "profiling" > (matching incoming documents against a stored index of queries). > > The very question I need answered was asked on 31 October, 2001 in > a message with the subject "Alert function" (message ID 116006, I do > believe), but there was never an answer. I will, in fact, quote this > message, as it states the question very well indeed: > > > One thing I need is an alert function, that is, instead of > searching > > with a query on a lot of documents, I want all incoming documents > to be > > searched against a bunch of stored search queries (called > "agents"). > > > > The simple solution is to index the document and then iterate > through > > all agents, but that isn't very scalable (say there are 10 new > documents > > a minute, and 10000 stored agents...). > > > > The way commercial products like Verity and Autonomy handles this > is to > > store the search queries (the agents) instead, and then use the > incoming > > document as a search query against the stored agents. This way it > > becomes extremely scalable. > > > > Could this be a solution with Lucene too, and how would that look > like? > > This was originally posted by a Christian Ubbesen. I have tried to > contact > him, to see if he had managed to solve this, but have not heard back > (hey, > he could well have moved on in 3.5 years!) > > Thanks, > -- Robert > > -------------------- > Robert Watkins > [EMAIL PROTECTED] > -------------------- > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]