On Dec 8, 2005, at 10:51 AM, Sonja Löhr wrote:
Thank you both, I found it
(I really asked a bit too early, sorry)
The highlighter works correct if I use my custom Analyzer during
indexing
(and for QueryParser), BUT
when preparing the TokenStream to feed the highlighter, I must NOT
use it.
gt; > TokenStream tStream = analyzer.tokenStream("body", new
> > StringReader(bodyText));
> > return = highlighter.getBestFragments(tStream,
> bodyText, 4, " .....
> > ");
> > }
> >
> > (getDisplayText(URL url, Query query) fetches the document
> by its
se() instead of
MultiFieldQueryParser, but
the output didn't change.
Many many thanks if you read until here!
And even more if you hava an idea where the error is likely to be
found.
sonja
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Donnerstag, 8.
> if it comes from PdfBox, the wrong text is
> highlighted.
Wrong in what sense?
A couple of things to consider from looking at your
code.
* It is preferable to pass a rewritten query to the
highlighter (pass the same rewritten query to searcher
if you want to avoid query rewriting costs twice).
ginal Message-----
> From: Erik Hatcher [mailto:[EMAIL PROTECTED]
> Sent: Donnerstag, 8. Dezember 2005 10:59
> To: java-user@lucene.apache.org
> Subject: Re: pdf and highlighting
>
> Sonja,
>
> Do you have an example, or at least some relevant code, that
> would help t
Sonja,
Do you have an example, or at least some relevant code, that would
help the community in helping resolve this?
Erik
On Dec 8, 2005, at 4:24 AM, Sonja Löhr wrote:
Hi, all!
I have a question concerning analysis and highlighting. I'm indexing
multiple document formats (up to
Hi, all!
I have a question concerning analysis and highlighting. I'm indexing
multiple document formats (up to now, only html and pdf occured, and use the
highlighter from the Lucene sandbox.
The documents text is extracted via JTidy and PDFBox, respectively, then in
both indexing and search anal