[ 
https://issues.apache.org/jira/browse/TIKA-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862894#comment-17862894
 ] 

ASF GitHub Bot commented on TIKA-3347:
--------------------------------------

kbachuHighSpot commented on PR #1473:
URL: https://github.com/apache/tika/pull/1473#issuecomment-2207526085

   I was able to use tika `3.0.0-BETA` and the `pdfbox` is at `3.0.2`.
   
   Seeing this issue - any ideas? am I missing anything?
   
   ```
   java.io.IOException: Invalid character code 0xD800
        at 
org.apache.fontbox.ttf.CmapSubtable.processSubtype13(CmapSubtable.java:320)
        at 
org.apache.fontbox.ttf.CmapSubtable.initSubtable(CmapSubtable.java:113)
        at org.apache.fontbox.ttf.CmapTable.read(CmapTable.java:87)
        at org.apache.fontbox.ttf.TrueTypeFont.readTable(TrueTypeFont.java:365)
        at org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:165)
        at org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:144)
        at org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:56)
        at org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:27)
        at org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:66)
        at org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:50)
        at 
org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.addTrueTypeFont(FileSystemFontProvider.java:684)
        at 
org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.scanFonts(FileSystemFontProvider.java:390)
        at 
org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.<init>(FileSystemFontProvider.java:365)
        at 
org.apache.pdfbox.pdmodel.font.FontMapperImpl$DefaultFontProvider.<clinit>(FontMapperImpl.java:139)
        at 
org.apache.pdfbox.pdmodel.font.FontMapperImpl.getProvider(FontMapperImpl.java:158)
        at 
org.apache.pdfbox.pdmodel.font.FontMapperImpl.findFont(FontMapperImpl.java:410)
        at 
org.apache.pdfbox.pdmodel.font.FontMapperImpl.getTrueTypeFont(FontMapperImpl.java:318)
        at 
org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.<init>(PDTrueTypeFont.java:142)
        at 
org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:153)
        at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:170)
        at 
org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:72)
        at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:893)
        at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:531)
        at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:506)
        at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:153)
        at 
org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:153)
        at 
org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:362)
        at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:137)
        at 
org.apache.tika.parser.pdf.AbstractPDF2XHTML.processPages(AbstractPDF2XHTML.java:1369)
        at 
org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:235)
        at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:108)
        at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:215)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
        at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:203)
        at com.xxx.content.Tika.extract(Tika.java:49)
        at com.xxx.content.Tika.extractText(Tika.java:33)
   ```




> Upgrade to PDFBox 3.x when available
> ------------------------------------
>
>                 Key: TIKA-3347
>                 URL: https://issues.apache.org/jira/browse/TIKA-3347
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>
> 3.0.0-RC1 was recently released.  We should integrate it on a dev branch asap 
> so that we can help with regression testing...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to