On Thu, Sep 03, 2009 at 03:07:18PM +0200, Jukka Zitting wrote: > Hi, > > On Wed, Sep 2, 2009 at 2:40 PM, David Causse<dcau...@spotter.com> wrote: > > If I use tika for parsing HTML code and inject parsed String to a lucene > > analyzer. What about the offset information for KWIC and return to text > > (like the google cache view)? how can I keep track of the offsets > > between tika parser and lucene analyzer? > > Currently Tika doesn't expose that information but the Tika Parser API > was designed for such use in mind, so it will be possible to add the > offset information. Please file a Tika feature request [1] for this.
I created TIKA-272, the idea behind is to be able to use unmodified lucene analyzers with tika and keep offset correctness. Thank you. -- David Causse Spotter http://www.spotter.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org