Re: Highlighting + phrase queries

Marjan Celikik Thu, 10 Jan 2008 06:33:51 -0800

Mark Miller wrote:

The Highlighter works by comparing the TokenStream of the documentwith the Tokens in the query. The TokenStream can be rebuilt from theindex if you use TermVectors with TokenSources or you can get it byreanalyzing the document. Each Token from the TokenStream is checkedagainst Tokens in the query, and if there is a match you have aHighlight. The original text is then reconstructed with the Highlightsfrom info in the TokenStream about original offsets into the documentfor each Token. Also, there is a Fragment system that will break apartthe Highlighted text into score sorted text Fragments.

OK, this is what I already knew...

That is why the original contrib does not work with PhraseQuery's. Itsimply matches Tokens from the query with those in the TokenStream.LUCENE-794 takes the TokenStream and shoves it into a MemoryIndex.Then, after converting the query to a SpanQuery approximation,getSpans is called on the index for the query. The Spans provide abound on what positions should be Highlighted. Everything else is doneexactly like the original Highlighter (This is a patch that fits intothe original Highlighter framework that was developed, therebyretaining all of its richness :) ).

Thanks! This is what I needed. Still I don't know how to obtain thesource code of your patch :(


Majan.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Highlighting + phrase queries

Reply via email to