[
https://issues.apache.org/jira/browse/PDFBOX-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820953#comment-13820953
]
Sharmilee S commented on PDFBOX-1779:
-------------------------------------
I tried that and now i am getting this:
ExtractText failed with the following exception:
java.io.IOException: Missing end of file marker '%%EOF'
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.getStartxrefOffset(NonSequentialPDFParser.java:576)
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:325)
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:700)
at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1245)
at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:208)
at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)
> Error: End-of-File, expected line
> ----------------------------------
>
> Key: PDFBOX-1779
> URL: https://issues.apache.org/jira/browse/PDFBOX-1779
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.8.2
> Environment: Linux
> pdfbox 1.8.2
> Reporter: Sharmilee S
> Priority: Critical
> Labels: linux, pdf, pdfbox, textExtraction
>
> Getting this exception when filename is passed from shell script in linux.
> ExtractText failed with the following exception:
> java.io.IOException: Error: End-of-File, expected line
> at
> org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1489)
> at
> org.apache.pdfbox.pdfparser.PDFParser.parseHeader(PDFParser.java:298)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:173)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1211)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1176)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1101)
> at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:212)
> at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
> at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)
--
This message was sent by Atlassian JIRA
(v6.1#6144)