Stephen H created TIKA-4453:
-------------------------------

             Summary: ForkParser fails on documents with more than 100 embedded 
documents
                 Key: TIKA-4453
                 URL: https://issues.apache.org/jira/browse/TIKA-4453
             Project: Tika
          Issue Type: Bug
          Components: core
    Affects Versions: 3.2.1
            Reporter: Stephen H
         Attachments: forkparser-patch.txt

ForkParser uses RecursiveMetadataContentHandlerProxy, which overrides 
endEmbeddedDocument() but does not call the superclass method. Because of this, 
the embeddedDepth in AbstractRecursiveParserWrapperHandler gets incremented 
with each new embedded document but never decremented. Once it hits 100 
embedded documents and the maximum depth a SAXException is thrown by 
AbstractRecursiveParserWrapperHandler startEmbeddedDocument().

The attached patch adds a new method to AbstractRecursiveParserWrapperHandler 
to decrement the depth which is called by RecursiveMetadataContentHandlerProxy 
endEmbeddedDocument(). There is a new ForkParser test for a document with 110 
embedded documents.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to