Re: Token position vs. token offset - how to bring them together?

2011-02-01 Thread Karolina Bernat
t the best way to process, but I couldn't find any better. Please let me know, if there is any other (better) way to do it. On Fri, Jan 28, 2011 at 4:41 PM, Karolina Bernat < karolina.ber...@googlemail.com> wrote: > Hello, > > since I moved on with my offset-info problem in HTML

Token position vs. token offset - how to bring them together?

2011-01-28 Thread Karolina Bernat
Hello, since I moved on with my offset-info problem in HTML files, I got a new one trying to bring the tokens positions information together with tokens/term offset information. Can someone tell me, how can I get a token, if I know its position? It would be nice to get the tokens position from the

Re: ****SPAM(5.0)**** Re: How to index part numbers

2011-01-28 Thread Karolina Bernat
1:45 PM, Wulf Berschin wrote: > Hi Karolina, > > yes (of course!) We have an XML element for the part numbers, but upto now > they are not all tagged thus we need regex matching as well... > > Am 28.01.2011 13:31, schrieb Karolina Bernat: > >> Hi Wulf, >> &g

Re: How to index part numbers

2011-01-28 Thread Karolina Bernat
Hi Wulf, can I ask, if it is structured documentation (like XML or SGML) you're dealing with? It's because I also work with technical documentation and we do exactly, waht you're asking for, but it is XML-data. On Fri, Jan 28, 2011 at 1:05 PM, Wulf Berschin wrote: > Hi, > > I'm poking in the d

Re: Preserving original HTML file offsets for highlighting

2011-01-26 Thread Karolina Bernat
#x27; package javadocs to get information how to do > this. Note: This HTMLStripCharFilter does not need TIKA at all). > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > >

Re: Preserving original HTML file offsets for highlighting

2011-01-25 Thread Karolina Bernat
, just to get what I need (highlighting of the hits within HTML files). Thank you so much for your help:-) Karo On Mon, Jan 24, 2011 at 2:03 PM, Karolina Bernat < karolina.ber...@googlemail.com> wrote: > Hi all, > > I'm new to Lucene and have a question about indexing/highlighti

Preserving original HTML file offsets for highlighting

2011-01-24 Thread Karolina Bernat
Hi all, I'm new to Lucene and have a question about indexing/highlighting of HTML files with Lucene. What I need to do is highlight the hits (terms) in the original HTML file (or get the positions of the terms/tokens in the original file). This problem has already been described by Fred Toth in t