Gábor Stefanik created PDFBOX-5879: --------------------------------------
Summary: Regression from PDFBOX-5841: Text extraction with rotation magic fails for PDF with multiple content streams in a page Key: PDFBOX-5879 URL: https://issues.apache.org/jira/browse/PDFBOX-5879 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 3.0.3 PDFBox Reporter: Gábor Stefanik Attachments: MVM_Aram_augusztus.pdf {code:java} java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic -i="MVM_Aram_augusztus.pdf" {code} fails with the following error: {code:java} java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app') at org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336) at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225) at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62) at picocli.CommandLine.executeUserObject(CommandLine.java:2045) at picocli.CommandLine.access$1500(CommandLine.java:148) at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465) at picocli.CommandLine$RunLast.handle(CommandLine.java:2457) at picocli.CommandLine$RunLast.handle(CommandLine.java:2419) at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277) at picocli.CommandLine$RunLast.execute(CommandLine.java:2421) at picocli.CommandLine.execute(CommandLine.java:2174) at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code} The same command succeeds in 3.0.2. The triggering PDF can be downloaded from [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,] and is also attached. The root cause appears to be this change: [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2] from PDFBOX-5841 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org