Actually, this second issue was opened since Highlight seams to ignore 
positions and treats WITH_POSITIONS_OFFSETS like it was WITH_OFFSETS.

https://issues.apache.org/jira/browse/LUCENE-3091

As far as I remember, the trouble is that to trust positions in the tokenstream 
built from termvector, you have to know the field properties, and it isn't 
accessible at the code level where the decision is made to use offset or 
positions. So some modifications are to be made to pass this information with 
the token stream, or to give access to field properties to the highlighter. 
Neither of those seamed straightforward. But I really did take a very short 
look so I'm sure of nothing there.

I hope that someone of greater vision will find an elegant solution to this :). 
But otherwise I hope to find some time to take a look in a couple weeks, while 
I've part of the context still in mind.

Pierre

-----Message d'origine-----
De : Joel Halbert [mailto:j...@su3analytics.com] 
Envoyé : vendredi 27 mai 2011 14:05
À : java-user@lucene.apache.org
Objet : RE: FastVectorHighlighter.getBestFragments returning null

Hi Pierre,

Thanks for the pointer. So if I understand correctly this bug definitely
applies to fields with TermVector.WITH_OFFSETS.

My field uses TermVector.WITH_POSITIONS_OFFSETS)

I wasn't sure from the bug report if it applies to
WITH_POSITIONS_OFFSETS as well? It looks like it might?

- Joel

On Fri, 2011-05-27 at 13:56 +0200, Pierre GOSSE wrote:

> Hi,
> 
> Maybe is it related to :
> https://issues.apache.org/jira/browse/LUCENE-3087
> 
> Pierre
> 
> -----Message d'origine-----
> De : Joel Halbert [mailto:j...@su3analytics.com] 
> Envoyé : vendredi 27 mai 2011 12:57
> À : lucene users
> Objet : FastVectorHighlighter.getBestFragments returning null
> 
> Hi,
> 
> I'm using Lucene 3.0.3. I'm extracting snippets using
> FastVectorHighlighter, for some snippets (I think always when searching
> for exact matches, quoted) the fragment is null.
> 
> Code looks like:
> 
> 
>                       query = QueryParser.escape(query);
>                       if (exact) {
>                               query = "\""+query+"\"";
>                       }
>                         BooleanQuery allQ = new BooleanQuery();
>                       Query bodyQ = new QueryParser(Version.LUCENE_30, BODY, 
> analyser).parse(query);
>                       termQ.add(new BooleanClause(bodyQ, Occur.SHOULD));
>                         // add more queries
>                         allQ.add(new BooleanClause(termQ, Occur.MUST));
>                         
>                       TopDocs res = is.search(allQ, null, upperRange);        
>                       FastVectorHighlighter highlighter = new 
> FastVectorHighlighter(true, true);
>                       
>                       for (int i = in.getLowerRange(); i < 
> Math.min(res.totalHits, upperRange); i++) {
> 
>                               String[] bodyFrags =
>                                               
> highlighter.getBestFragments(highlighter.getFieldQuery(bodyQ),
>                                               is.getIndexReader(), 
> res.scoreDocs[i].doc, BODY, 120, 2);
>                 
>                                 // bodyFrags is null
>                     }
> 
> 
> I do get a hit, and the content with the exact match is coming from the
> BODY field, but I cann't seem to get the fragment out.
> 
> Any clues,
> 
> Thanks
> 
> - Joel


Reply via email to