[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14365733#comment-14365733 ]
Tim Allison commented on TIKA-1575: ----------------------------------- If the multithreading hypothesis is correct, we had to get _extremely_ lucky because we're now clearing the resources on PDFont after every document, and it looks like the fonts are in the document, but they're clearly broken. So that means that thread B would have had to overwrite (correct) the font in thread A after thread A read the fonts for p. 14 but before it processed p 14...all while threads C through J didn't happen to hit clearResources() between the overwrite by Thread B and the processing by Thread A. Is this plausible? Are there other static objects that could explain this behavior? Something else going on? > Upgrade to PDFBox 1.8.9 when available > -------------------------------------- > > Key: TIKA-1575 > URL: https://issues.apache.org/jira/browse/TIKA-1575 > Project: Tika > Issue Type: Improvement > Reporter: Tim Allison > Priority: Minor > Attachments: 005937.pdf.json, 005937_1_8_9-SNAPSHOT.pdf.json, > 10-814_Appendix B_v3.pdf, PDFBox_1_8_8VPDFBox_1_8_9-SNAPSHOT.xlsx, > PDFBox_1_8_8VPDFBox_1_8_9-SNAPSHOT_reports.zip, > PDFBox_1_8_8Vs1_8_9_20150316.zip, content_diffs_20150316.xlsx > > > The PDFBox community is about to release 1.8.9. Let's use this issue to > track discussions before the release and to track Tika's upgrade to PDFBox > 1.8.9 -- This message was sent by Atlassian JIRA (v6.3.4#6332)