Re: Getting matched words for PhraseQuery or SpanNearQuery

Jaco Tue, 28 Apr 2009 12:36:52 -0700

Hi Mark,

Thanks for that - after wading through some source code in the highlighter
package and reading more docs I managed to get out the info I needed by
getting the start and end token position of each span found and subsequently
getting the words back out of the TokenStream that I initially created.


Thanks again, bye,

Jaco.

2009/4/28 Mark Miller <markrmil...@gmail.com>

> The Span Highlighter gets positions by attempting to convert a standard
> Lucne Query to a SpanQuery approximate, and then calling getSpans on the
> span query to find start end positions (getSpans is called against a fast
> single document MemoryIndex). You might check out WeightedSpanTermExtractor
> in the Highlighter package. It may be a bit hard to navigate for a new user
> though.
>
> - Mark
>
>
> Jaco wrote:
>
>> Hello,
>>
>> I am pretty new to the Lucene API, and there's something I can't figure
>> out
>> from the docs and from the mailing list archives. I hope somebody can
>> point
>> me into the right direction. Here's my case: for text analysis purposes I
>> am
>> doing PhraseQueries and SpanNearQueries. Using the highlighter, I can
>> extract text snippets with matching words marked.
>>
>> What I really am looking for is to extract information on each match to
>> the
>> query, if possible including position information in the text. For
>> example,
>> if the text I am searching in is [a b c a d e f a b], and my query is [a
>> b],
>> then I want to know where the words [a b] were matched together in the
>> text
>> due to the use of the PhraseQuery/SpanNearQuery ([a b] will get me two
>> occurrences in the documents text).
>>
>> As far as I can find out, the highlighter is capable of marking the
>> individual words causing the hit, but it can't show me which words
>> together
>> form one 'hit' to the search text. Is there a way to do this with the
>> Lucene
>> API? Any help would be appreciated!
>>
>> Thanks in advance, bye,
>>
>> Jaco.
>>
>> PS this is a follow up for this thread in the Solr user mailing list:
>> http://markmail.org/thread/cokya3rsmzsjocdh
>>
>>
>>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: Getting matched words for PhraseQuery or SpanNearQuery

Reply via email to