Thx, I'll try my best to keep best of both worlds. I can supposedly parse different content types (pdf, rtf, doc, html,...) So work on the provided boilepipe handler will be needed so.
Cheers On Mon, Jul 30, 2012 at 10:33 AM, Jukka Zitting <jukka.zitt...@gmail.com>wrote: > Hi, > > On Fri, Jul 27, 2012 at 3:38 PM, Marc-Daniel Ortega > <md.ort...@eligotech.com> wrote: > > Caused by: org.apache.tika.sax.SecureContentHandler$SecureSAXException: > > Suspected zip bomb: 100 levels of XML element nesting > > This could be caused by BoilerPipe not closing elements properly. > > The BoilerPipeContentHandler class was originally designed to be used > on top of the Parser interface, not inside a Parser implementation, > which might be the cause of your trouble. You could either look at > adjusting BoilerPipeContentHandler so that it works also with your use > case, or at changing your application code to use BoilerPipe on top of > instead of inside AutoDetectParser. > > BR, > > Jukka Zitting >