Re: Getting matched words for PhraseQuery or SpanNearQuery

Mark Miller Tue, 28 Apr 2009 05:22:15 -0700

The Span Highlighter gets positions by attempting to convert a standardLucne Query to a SpanQuery approximate, and then calling getSpans on thespan query to find start end positions (getSpans is called against afast single document MemoryIndex). You might check outWeightedSpanTermExtractor in the Highlighter package. It may be a bithard to navigate for a new user though.


- Mark


Jaco wrote:

Hello,

I am pretty new to the Lucene API, and there's something I can't figure out
from the docs and from the mailing list archives. I hope somebody can point
me into the right direction. Here's my case: for text analysis purposes I am
doing PhraseQueries and SpanNearQueries. Using the highlighter, I can
extract text snippets with matching words marked.

What I really am looking for is to extract information on each match to the
query, if possible including position information in the text. For example,
if the text I am searching in is [a b c a d e f a b], and my query is [a b],
then I want to know where the words [a b] were matched together in the text
due to the use of the PhraseQuery/SpanNearQuery ([a b] will get me two
occurrences in the documents text).

As far as I can find out, the highlighter is capable of marking the
individual words causing the hit, but it can't show me which words together
form one 'hit' to the search text. Is there a way to do this with the Lucene
API? Any help would be appreciated!

Thanks in advance, bye,

Jaco.

PS this is a follow up for this thread in the Solr user mailing list:
http://markmail.org/thread/cokya3rsmzsjocdh



--
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Getting matched words for PhraseQuery or SpanNearQuery

Reply via email to