Stephen H created TIKA-4453: ------------------------------- Summary: ForkParser fails on documents with more than 100 embedded documents Key: TIKA-4453 URL: https://issues.apache.org/jira/browse/TIKA-4453 Project: Tika Issue Type: Bug Components: core Affects Versions: 3.2.1 Reporter: Stephen H Attachments: forkparser-patch.txt
ForkParser uses RecursiveMetadataContentHandlerProxy, which overrides endEmbeddedDocument() but does not call the superclass method. Because of this, the embeddedDepth in AbstractRecursiveParserWrapperHandler gets incremented with each new embedded document but never decremented. Once it hits 100 embedded documents and the maximum depth a SAXException is thrown by AbstractRecursiveParserWrapperHandler startEmbeddedDocument(). The attached patch adds a new method to AbstractRecursiveParserWrapperHandler to decrement the depth which is called by RecursiveMetadataContentHandlerProxy endEmbeddedDocument(). There is a new ForkParser test for a document with 110 embedded documents. -- This message was sent by Atlassian Jira (v8.20.10#820010)