Hi all,

I'm new to Lucene and have a question about indexing/highlighting of HTML
files with Lucene.

What I need to do is highlight the hits (terms) in the original HTML file
(or get the positions of the terms/tokens in the original file).
This problem has already been described by Fred Toth in this thread in 2005
(Preserving original HTML file offsets for highlighting, need
HTMLTokenizer?):

http://mail-archives.apache.org/mod_mbox/lucene-java-user/200505.mbox/%3c6.2.1.2.2.20050530134630.063ae...@fast.synernet.com%3E

I've searched the mailing list archives hoping for an answer, but I had no
luck.

Does anyone have an idea, if there is a solution for this problem? Also if
you know, that it's not possible with Lucene to highlight the hits in the
original html-file, it would be helpful to know (I could stop looking for
it...).

Many thanks in advance!
Karo

P.S. Actually I wanted to answer the original thred/question from 2005 - is
there a way to do this? How can I post an answer to an old thread/mail from
the mailing list?

Reply via email to