> What are the > current limitations of the > Lucene Highlighter? Does does it perform under high > query load?
The major bottlenecks are typically in retrieving document content and then re-tokenizing with an Analyzer - not the actual choice of highlighting code. I've not used the Nutch summariser so I couldn't say what you might expect in terms of a speed difference to the highlighting stage. In terms of functionality, from a quick glance at the code I would say it was probably missing the following highlighter features: * Choice of field (hardcoded to "content") * Choice of Analyzer * Re-ordering selected fragments to natural order * Choice of markup (eg span vs <b>) * Support for tokenStreams with overlapping tokens (eg synonyms) * Support for term weightings in fragment selection (eg IDF) The Nutch summarizer also looks to drag in Nutch-specific classes too eg using Nutch's Query object not Lucene's. Currently both summarizers can mistakenly highlight terms that are part of a phrase query where only one term actually matches. This is less than ideal but the solution requires a major rewrite of both highlighter's logic. Cheers Mark ___________________________________________________________ To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]