On Dec 8, 2005, at 10:51 AM, Sonja Löhr wrote:
Thank you both, I found it
(I really asked a bit too early, sorry)

The highlighter works correct if I use my custom Analyzer during indexing
(and for QueryParser), BUT
when preparing the TokenStream to feed the highlighter, I must NOT use it.

TokenStream tStream = new GermanAnalyzer().tokenStream("body", new
StringReader(bodyText));                
System.out.println( highlighter.getBestFragments(tStream, bodyText, 4, "
..... "));

works, wheras

TokenStream tStream = new GermanHtmlAnalyzer().tokenStream("body", new
StringReader(bodyText));                
System.out.println( highlighter.getBestFragments(tStream, bodyText, 4, "
..... "));

gives rubbish highlighting.

GermanHtmlAnalyzer feeds a normal GermanAnalyzer with a shortened String (native characters) if the input contains decimal or html entities, but then I'm totally confused why there is a problem with pdf text and not with HTML
text...

The likely reason is that the token offsets fed to the highlighter don't jive with the positions of the text in the text you're highlighting. You're generating token offsets for strings that have been replaced (and likely different sizes), but highlighting the original text with the entities left intact.

Maybe??

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to