Re: pdf and highlighting

Erik Hatcher Thu, 08 Dec 2005 12:08:44 -0800


On Dec 8, 2005, at 10:51 AM, Sonja Löhr wrote:

Thank you both, I found it
(I really asked a bit too early, sorry)
The highlighter works correct if I use my custom Analyzer duringindexing
(and for QueryParser), BUT
when preparing the TokenStream to feed the highlighter, I must NOTuse it.
TokenStream tStream = new GermanAnalyzer().tokenStream("body", new
StringReader(bodyText));                
System.out.println( highlighter.getBestFragments(tStream, bodyText,4, "
..... "));

works, wheras

TokenStream tStream = new GermanHtmlAnalyzer().tokenStream("body", new
StringReader(bodyText));                
System.out.println( highlighter.getBestFragments(tStream, bodyText,4, "
..... "));

gives rubbish highlighting.
GermanHtmlAnalyzer feeds a normal GermanAnalyzer with a shortenedString(native characters) if the input contains decimal or html entities,but thenI'm totally confused why there is a problem with pdf text and notwith HTML
text...

The likely reason is that the token offsets fed to the highlighterdon't jive with the positions of the text in the text you'rehighlighting. You're generating token offsets for strings that havebeen replaced (and likely different sizes), but highlighting theoriginal text with the entities left intact.


Maybe??

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: pdf and highlighting

Reply via email to