Re: Highlighter that works with phrase and span queries

Paul Elschot Wed, 27 Jun 2007 08:45:08 -0700

On Wednesday 27 June 2007 17:17, mark harwood wrote:
> >>you would still have the major problem of which matches do you keep 
information for
> 
> Yes, doing this efficiently is the main issue. Some vague thoughts I had:
>...
> 3) For each call to scorer.next() on the top level query, the 
HighlightObserver class would check to see if the doc was a "keeper" (i.e. 
it's score places it in the required top "n" docs PriorityQueue) and if so, 
would retain a copy of all the transient match info currently held in the 
ThreadLocal for this doc and associate it with the new TopDoc object placed 
in the top docs PriorityQueue.


This can be done more efficiently by skipping the Spans themselves to
the next document for which the matches need to be kept. For each doc,
the Spans could then be copied by iterating until the next matching doc
in the search.
Even better would be to use a Filter in the search to limit the results to the 
matches that are immediately needed, but a Filter still requires a BitSet
over all indexed documents, and that is probably overkill for highlighting.
Iterating the Spans will be in doc number order, so some mapping back
to the scored order would still be needed.

I have not looked at any highlighting code yet. Is there already an extension
of PhraseQuery that has getSpans() ?

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Highlighter that works with phrase and span queries

Reply via email to