Hi Uwe,

thanks for this hint. I'm not sure, how much of the Solr functionality do I
need to implement for using the HTTPStripCharFilter. I'm using Apache Tika
for HTML parsing. Furthermore I use the StandardAnalyzer to initialize my
IndexWriter. I don't use a Tokenizer - this would be the Solr approach?

At this point, I'm not sure, how to use the Solr within my application,
where I already use Lucene. Can I use i.e. just this one or few classes from
the Solr Core while indexing with Lucene IndexWriter? Or do I need to switch
my indexing and searching to the Solr way, just to get what I need
(highlighting of the hits within HTML files).

Thank you so much for your help:-)
Karo



On Mon, Jan 24, 2011 at 2:03 PM, Karolina Bernat <
karolina.ber...@googlemail.com> wrote:

> Hi all,
>
> I'm new to Lucene and have a question about indexing/highlighting of HTML
> files with Lucene.
>
> What I need to do is highlight the hits (terms) in the original HTML file
> (or get the positions of the terms/tokens in the original file).
> This problem has already been described by Fred Toth in this thread in 2005
> (Preserving original HTML file offsets for highlighting, need
> HTMLTokenizer?):
>
>
> http://mail-archives.apache.org/mod_mbox/lucene-java-user/200505.mbox/%3c6.2.1.2.2.20050530134630.063ae...@fast.synernet.com%3E
>
> I've searched the mailing list archives hoping for an answer, but I had no
> luck.
>
> Does anyone have an idea, if there is a solution for this problem? Also if
> you know, that it's not possible with Lucene to highlight the hits in the
> original html-file, it would be helpful to know (I could stop looking for
> it...).
>
> Many thanks in advance!
> Karo
>
> P.S. Actually I wanted to answer the original thred/question from 2005 - is
> there a way to do this? How can I post an answer to an old thread/mail from
> the mailing list?
>

Reply via email to