Thx,

I'll try my best to keep best of both worlds. I can supposedly parse
different content types (pdf, rtf, doc, html,...) So  work on the provided
boilepipe handler will be needed so.

Cheers

On Mon, Jul 30, 2012 at 10:33 AM, Jukka Zitting <jukka.zitt...@gmail.com>wrote:

> Hi,
>
> On Fri, Jul 27, 2012 at 3:38 PM, Marc-Daniel Ortega
> <md.ort...@eligotech.com> wrote:
> > Caused by: org.apache.tika.sax.SecureContentHandler$SecureSAXException:
> > Suspected zip bomb: 100 levels of XML element nesting
>
> This could be caused by BoilerPipe not closing elements properly.
>
> The BoilerPipeContentHandler class was originally designed to be used
> on top of the Parser interface, not inside a Parser implementation,
> which might be the cause of your trouble. You could either look at
> adjusting BoilerPipeContentHandler so that it works also with your use
> case, or at changing your application code to use BoilerPipe on top of
> instead of inside AutoDetectParser.
>
> BR,
>
> Jukka Zitting
>

Reply via email to