On Thu, Sep 03, 2009 at 03:07:18PM +0200, Jukka Zitting wrote:
> Hi,
> 
> On Wed, Sep 2, 2009 at 2:40 PM, David Causse<dcau...@spotter.com> wrote:
> > If I use tika for parsing HTML code and inject parsed String to a lucene
> > analyzer. What about the offset information for KWIC and return to text
> > (like the google cache view)? how can I keep track of the offsets
> > between tika parser and lucene analyzer?
> 
> Currently Tika doesn't expose that information but the Tika Parser API
> was designed for such use in mind, so it will be possible to add the
> offset information. Please file a Tika feature request [1] for this.

I created TIKA-272, the idea behind is to be able to use unmodified
lucene analyzers with tika and keep offset correctness.

Thank you.

-- 
David Causse
Spotter
http://www.spotter.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to