Hi,

On Wed, Sep 2, 2009 at 2:40 PM, David Causse<dcau...@spotter.com> wrote:
> If I use tika for parsing HTML code and inject parsed String to a lucene
> analyzer. What about the offset information for KWIC and return to text
> (like the google cache view)? how can I keep track of the offsets
> between tika parser and lucene analyzer?

Currently Tika doesn't expose that information but the Tika Parser API
was designed for such use in mind, so it will be possible to add the
offset information. Please file a Tika feature request [1] for this.

[1] https://issues.apache.org/jira/browse/TIKA

BR,

Jukka Zitting

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to