Re: pdf and highlighting

2005-12-08 Thread Erik Hatcher
On Dec 8, 2005, at 10:51 AM, Sonja Löhr wrote: Thank you both, I found it (I really asked a bit too early, sorry) The highlighter works correct if I use my custom Analyzer during indexing (and for QueryParser), BUT when preparing the TokenStream to feed the highlighter, I must NOT use it.

RE: pdf and highlighting

2005-12-08 Thread Sonja Löhr
gt; > TokenStream tStream = analyzer.tokenStream("body", new > > StringReader(bodyText)); > > return = highlighter.getBestFragments(tStream, > bodyText, 4, " ..... > > "); > > } > > > > (getDisplayText(URL url, Query query) fetches the document > by its

Re: pdf and highlighting

2005-12-08 Thread Erik Hatcher
se() instead of MultiFieldQueryParser, but the output didn't change. Many many thanks if you read until here! And even more if you hava an idea where the error is likely to be found. sonja -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Donnerstag, 8.

RE: pdf and highlighting

2005-12-08 Thread mark harwood
> if it comes from PdfBox, the wrong text is > highlighted. Wrong in what sense? A couple of things to consider from looking at your code. * It is preferable to pass a rewritten query to the highlighter (pass the same rewritten query to searcher if you want to avoid query rewriting costs twice).

RE: pdf and highlighting

2005-12-08 Thread Sonja Löhr
ginal Message----- > From: Erik Hatcher [mailto:[EMAIL PROTECTED] > Sent: Donnerstag, 8. Dezember 2005 10:59 > To: java-user@lucene.apache.org > Subject: Re: pdf and highlighting > > Sonja, > > Do you have an example, or at least some relevant code, that > would help t

Re: pdf and highlighting

2005-12-08 Thread Erik Hatcher
Sonja, Do you have an example, or at least some relevant code, that would help the community in helping resolve this? Erik On Dec 8, 2005, at 4:24 AM, Sonja Löhr wrote: Hi, all! I have a question concerning analysis and highlighting. I'm indexing multiple document formats (up to

pdf and highlighting

2005-12-08 Thread Sonja Löhr
Hi, all! I have a question concerning analysis and highlighting. I'm indexing multiple document formats (up to now, only html and pdf occured, and use the highlighter from the Lucene sandbox. The documents text is extracted via JTidy and PDFBox, respectively, then in both indexing and search anal