The Span Highlighter gets positions by attempting to convert a standard
Lucne Query to a SpanQuery approximate, and then calling getSpans on the
span query to find start end positions (getSpans is called against a
fast single document MemoryIndex). You might check out
WeightedSpanTermExtractor in the Highlighter package. It may be a bit
hard to navigate for a new user though.
- Mark
Jaco wrote:
Hello,
I am pretty new to the Lucene API, and there's something I can't figure out
from the docs and from the mailing list archives. I hope somebody can point
me into the right direction. Here's my case: for text analysis purposes I am
doing PhraseQueries and SpanNearQueries. Using the highlighter, I can
extract text snippets with matching words marked.
What I really am looking for is to extract information on each match to the
query, if possible including position information in the text. For example,
if the text I am searching in is [a b c a d e f a b], and my query is [a b],
then I want to know where the words [a b] were matched together in the text
due to the use of the PhraseQuery/SpanNearQuery ([a b] will get me two
occurrences in the documents text).
As far as I can find out, the highlighter is capable of marking the
individual words causing the hit, but it can't show me which words together
form one 'hit' to the search text. Is there a way to do this with the Lucene
API? Any help would be appreciated!
Thanks in advance, bye,
Jaco.
PS this is a follow up for this thread in the Solr user mailing list:
http://markmail.org/thread/cokya3rsmzsjocdh
--
- Mark
http://www.lucidimagination.com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org