Hi Mark,
Thanks for that - after wading through some source code in the highlighter
package and reading more docs I managed to get out the info I needed by
getting the start and end token position of each span found and subsequently
getting the words back out of the TokenStream that I initially cr
The Span Highlighter gets positions by attempting to convert a standard
Lucne Query to a SpanQuery approximate, and then calling getSpans on the
span query to find start end positions (getSpans is called against a
fast single document MemoryIndex). You might check out
WeightedSpanTermExtractor