[ https://issues.apache.org/jira/browse/PDFBOX-586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler closed PDFBOX-586. ------------------------------------- Resolution: Won't Fix I guess this ticket is no longer relevant. There is an android port of PDFBox [available|https://github.com/TomRoush/PdfBox-Android] > Text Extraction on Android > -------------------------- > > Key: PDFBOX-586 > URL: https://issues.apache.org/jira/browse/PDFBOX-586 > Project: PDFBox > Issue Type: Improvement > Components: Text extraction > Affects Versions: 1.1.0 > Environment: Windows XP + Eclipse + PDFBox sources > Reporter: Bernard > Priority: Major > Labels: modularization > Attachments: ASEB-Camping_Car_ou_Bateau.pdf, > EncryptedFileTest_AES.pdf, EncryptedFileTest_RC4.pdf, Eval.pdf, > PDFBOX586-ASEB-Camping_Car_ou_Bateau.txt, PDFBOX586-Eval.txt, > PDFBOX586-internals.txt, TestPDFBox.zip, internals.pdf > > > Hi, > I have noticed that I can extract text some PDF files in PDFBox 0.7.4 but for > the same file, the same page, PDFBox 1.1.0 doesn't retreive any text, or the > extraction is worst. > Am I the only only one who think there is a regression in text extraction ? > My code is like this : > {code} > PDDocument document = PDDocument.load("/sdcard/internals.pdf"); > int numberOfPages = document.getNumberOfPages(); > resources = this.getResources(); > > android.util.Log.d(TEST_PDFBOX, "readerPDF() resources : "+resources); // > ANDROID code here to get file > resourceGlyphList = R.raw.glyphlist; > InputStream rawResource = > resources.openRawResource(R.raw.pdftextstripper); // PDFBOX property file > android.util.Log.d(TEST_PDFBOX, "readerPDF() rawResource : "+rawResource); > Properties properties = new Properties(); > properties.load(rawResource); > > PDFTextStripper stripper = new PDFTextStripper(properties ); > > stripper.setStartPage(pageNumber ); // 1 or any other page > stripper.setEndPage(pageNumber ); // same page as above > String s = "Page : "+pageNumber+"<br><br>"+stripper.getText(document); > android.util.Log.d(TEST_PDFBOX, "readerPDF() stripper extract pages text > : "+s); > {code} > Maybe I should use page.getContents().getStream() or > stripper.getTextForRegion( "class1" ) or stripper.writeText(doc, > outputStream) > I want the text as a String, not as a newly created file.... -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org