> What are the
> current limitations of the
> Lucene Highlighter?  Does does it perform under high
> query load?  

The major bottlenecks are typically in retrieving
document content and then re-tokenizing with an
Analyzer - not the actual choice of highlighting code.


I've not used the Nutch summariser so I couldn't say
what you might expect in terms of a speed difference
to the highlighting stage. In terms of functionality,
from a quick glance at the code I would say it was
probably missing the following highlighter features:
* Choice of field (hardcoded to "content")
* Choice of Analyzer
* Re-ordering selected fragments to natural order
* Choice of markup (eg span vs <b>)
* Support for tokenStreams with overlapping tokens (eg
synonyms)
* Support for term weightings in fragment selection
(eg IDF)

The Nutch summarizer also looks to drag in
Nutch-specific classes too eg using Nutch's Query
object not Lucene's.

Currently both summarizers can mistakenly highlight
terms that are part of a phrase query where only one
term actually matches. This is less than ideal but the
solution requires a major rewrite of both
highlighter's logic.


Cheers
Mark


                
___________________________________________________________ 
To help you stay safe and secure online, we've developed the all new Yahoo! 
Security Centre. http://uk.security.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to