Hi, On Wed, Sep 2, 2009 at 2:40 PM, David Causse<dcau...@spotter.com> wrote: > If I use tika for parsing HTML code and inject parsed String to a lucene > analyzer. What about the offset information for KWIC and return to text > (like the google cache view)? how can I keep track of the offsets > between tika parser and lucene analyzer?
Currently Tika doesn't expose that information but the Tika Parser API was designed for such use in mind, so it will be possible to add the offset information. Please file a Tika feature request [1] for this. [1] https://issues.apache.org/jira/browse/TIKA BR, Jukka Zitting --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org