t the best way to process, but I couldn't find any better.
Please let me know, if there is any other (better) way to do it.
On Fri, Jan 28, 2011 at 4:41 PM, Karolina Bernat <
karolina.ber...@googlemail.com> wrote:
> Hello,
>
> since I moved on with my offset-info problem in HTML
Hello,
since I moved on with my offset-info problem in HTML files, I got a new one
trying to bring the tokens positions information together with tokens/term
offset information. Can someone tell me, how can I get a token, if I know
its position? It would be nice to get the tokens position from the
1:45 PM, Wulf Berschin wrote:
> Hi Karolina,
>
> yes (of course!) We have an XML element for the part numbers, but upto now
> they are not all tagged thus we need regex matching as well...
>
> Am 28.01.2011 13:31, schrieb Karolina Bernat:
>
>> Hi Wulf,
>>
&g
Hi Wulf,
can I ask, if it is structured documentation (like XML or SGML) you're
dealing with? It's because I also work with technical documentation and we
do exactly, waht you're asking for, but it is XML-data.
On Fri, Jan 28, 2011 at 1:05 PM, Wulf Berschin wrote:
> Hi,
>
> I'm poking in the d
#x27; package javadocs to get information how to do
> this. Note: This HTMLStripCharFilter does not need TIKA at all).
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> >
, just to get what I need
(highlighting of the hits within HTML files).
Thank you so much for your help:-)
Karo
On Mon, Jan 24, 2011 at 2:03 PM, Karolina Bernat <
karolina.ber...@googlemail.com> wrote:
> Hi all,
>
> I'm new to Lucene and have a question about indexing/highlighti
Hi all,
I'm new to Lucene and have a question about indexing/highlighting of HTML
files with Lucene.
What I need to do is highlight the hits (terms) in the original HTML file
(or get the positions of the terms/tokens in the original file).
This problem has already been described by Fred Toth in t