On Wednesday 27 June 2007 17:17, mark harwood wrote: > >>you would still have the major problem of which matches do you keep information for > > Yes, doing this efficiently is the main issue. Some vague thoughts I had: >... > 3) For each call to scorer.next() on the top level query, the HighlightObserver class would check to see if the doc was a "keeper" (i.e. it's score places it in the required top "n" docs PriorityQueue) and if so, would retain a copy of all the transient match info currently held in the ThreadLocal for this doc and associate it with the new TopDoc object placed in the top docs PriorityQueue.
This can be done more efficiently by skipping the Spans themselves to the next document for which the matches need to be kept. For each doc, the Spans could then be copied by iterating until the next matching doc in the search. Even better would be to use a Filter in the search to limit the results to the matches that are immediately needed, but a Filter still requires a BitSet over all indexed documents, and that is probably overkill for highlighting. Iterating the Spans will be in doc number order, so some mapping back to the scored order would still be needed. I have not looked at any highlighting code yet. Is there already an extension of PhraseQuery that has getSpans() ? Regards, Paul Elschot --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]