Also, there's a default of 10,000 tokens per field at index time....

Erick

On 2/9/07, mark harwood <[EMAIL PROTECTED]> wrote:

See Highlighter.setMaxDocBytesToAnalyze(int byteCount)

It's default setting is limited in order to avoid excessive response
times.

Cheers
Mark


----- Original Message ----
From: Fred Eaker <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Friday, 9 February, 2007 4:28:36 PM
Subject: Highlighter returning incomplete field text

Is there a limit to how many characters a Highlighter or NullFragmenter
will
return?

I have indexed an entire HTML document (145kb). When I use the highlighter
with
a NullFragmenter, the getBestFragment and getBestFragments methods return
the
text of the field up to 51316 characters.

I have tried indexing other HTML documents as well, but get the same
results.

If I change the Highlighter's Encoder to DefaultEncoder, I get more
characters,
but not the entire field.

Here is some code:

Highlighter highlighter =
new Highlighter(new SimpleHTMLFormatter(),
new DefaultEncoder(),
new QueryScorer(query));

highlighter.setTextFragmenter(new NullFragmenter());

TokenStream tokenStream =
LuceneUtils.getAnalyzer().tokenStream(
fieldName,
new StringReader(hit.get(fieldName)));

String highlightedHit =
highlighter.getBestFragment(tokenStream, hit.get(fieldName));


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]







___________________________________________________________
The all-new Yahoo! Mail goes wherever you go - free your email address
from your Internet provider. http://uk.docs.yahoo.com/nowyoucan.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Reply via email to