Hi Mark, Thanks for that - after wading through some source code in the highlighter package and reading more docs I managed to get out the info I needed by getting the start and end token position of each span found and subsequently getting the words back out of the TokenStream that I initially created.
Thanks again, bye, Jaco. 2009/4/28 Mark Miller <markrmil...@gmail.com> > The Span Highlighter gets positions by attempting to convert a standard > Lucne Query to a SpanQuery approximate, and then calling getSpans on the > span query to find start end positions (getSpans is called against a fast > single document MemoryIndex). You might check out WeightedSpanTermExtractor > in the Highlighter package. It may be a bit hard to navigate for a new user > though. > > - Mark > > > Jaco wrote: > >> Hello, >> >> I am pretty new to the Lucene API, and there's something I can't figure >> out >> from the docs and from the mailing list archives. I hope somebody can >> point >> me into the right direction. Here's my case: for text analysis purposes I >> am >> doing PhraseQueries and SpanNearQueries. Using the highlighter, I can >> extract text snippets with matching words marked. >> >> What I really am looking for is to extract information on each match to >> the >> query, if possible including position information in the text. For >> example, >> if the text I am searching in is [a b c a d e f a b], and my query is [a >> b], >> then I want to know where the words [a b] were matched together in the >> text >> due to the use of the PhraseQuery/SpanNearQuery ([a b] will get me two >> occurrences in the documents text). >> >> As far as I can find out, the highlighter is capable of marking the >> individual words causing the hit, but it can't show me which words >> together >> form one 'hit' to the search text. Is there a way to do this with the >> Lucene >> API? Any help would be appreciated! >> >> Thanks in advance, bye, >> >> Jaco. >> >> PS this is a follow up for this thread in the Solr user mailing list: >> http://markmail.org/thread/cokya3rsmzsjocdh >> >> >> > > > -- > - Mark > > http://www.lucidimagination.com > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >