[ 
https://issues.apache.org/jira/browse/TIKA-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18004547#comment-18004547
 ] 

Tim Allison edited comment on TIKA-4453 at 7/10/25 9:14 PM:
------------------------------------------------------------

[~steveaitch] , let me know what you think of these slight mods: 
[https://github.com/apache/tika/pull/2278]

 

Thank you, again!


was (Author: talli...@mitre.org):
[~steveaitch] , let me know what you think of these slight mods: 
https://github.com/apache/tika/pull/2278

> ForkParser fails on documents with more than 100 embedded documents
> -------------------------------------------------------------------
>
>                 Key: TIKA-4453
>                 URL: https://issues.apache.org/jira/browse/TIKA-4453
>             Project: Tika
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 3.2.1
>            Reporter: Stephen H
>            Priority: Minor
>             Fix For: 4.0.0, 3.2.2
>
>         Attachments: forkparser-patch.txt
>
>
> ForkParser uses RecursiveMetadataContentHandlerProxy, which overrides 
> endEmbeddedDocument() but does not call the superclass method. Because of 
> this, the embeddedDepth in AbstractRecursiveParserWrapperHandler gets 
> incremented with each new embedded document but never decremented. Once it 
> hits 100 embedded documents and the maximum depth a SAXException is thrown by 
> AbstractRecursiveParserWrapperHandler startEmbeddedDocument().
> The attached patch adds a new method to AbstractRecursiveParserWrapperHandler 
> to decrement the depth which is called by 
> RecursiveMetadataContentHandlerProxy endEmbeddedDocument(). There is a new 
> ForkParser test for a document with 110 embedded documents.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to