[ 
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14365733#comment-14365733
 ] 

Tim Allison commented on TIKA-1575:
-----------------------------------

If the multithreading hypothesis is correct, we had to get _extremely_ lucky 
because we're now clearing the resources on PDFont after every document, and it 
looks like the fonts are in the document, but they're clearly broken.  So that 
means that thread B would have had to overwrite (correct) the font in thread A 
after thread A read the fonts for p. 14 but before it processed p 14...all 
while threads C through J didn't happen to hit clearResources() between the 
overwrite by Thread B and the processing by Thread A.  Is this plausible?  Are 
there other static objects that could explain this behavior?  Something else 
going on?

> Upgrade to PDFBox 1.8.9 when available
> --------------------------------------
>
>                 Key: TIKA-1575
>                 URL: https://issues.apache.org/jira/browse/TIKA-1575
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Minor
>         Attachments: 005937.pdf.json, 005937_1_8_9-SNAPSHOT.pdf.json, 
> 10-814_Appendix B_v3.pdf, PDFBox_1_8_8VPDFBox_1_8_9-SNAPSHOT.xlsx, 
> PDFBox_1_8_8VPDFBox_1_8_9-SNAPSHOT_reports.zip, 
> PDFBox_1_8_8Vs1_8_9_20150316.zip, content_diffs_20150316.xlsx
>
>
> The PDFBox community is about to release 1.8.9.  Let's use this issue to 
> track discussions before the release and to track Tika's upgrade to PDFBox 
> 1.8.9



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to