[ https://issues.apache.org/jira/browse/TIKA-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr resolved TIKA-1907. ----------------------------------- Fix Version/s: 3.0.0 (was: 3.0.1) Assignee: Tilman Hausherr Resolution: Fixed > Big Pdf parsing to text - Out of memory > --------------------------------------- > > Key: TIKA-1907 > URL: https://issues.apache.org/jira/browse/TIKA-1907 > Project: Tika > Issue Type: Bug > Affects Versions: 1.12 > Reporter: Nicolas Daniels > Assignee: Tilman Hausherr > Priority: Major > Fix For: 3.0.0 > > > Linked to PDFBox issue: [https://issues.apache.org/jira/browse/PDFBOX-3284] > I'm duplicating it here to make sure it will be fixed in Tika as well. Maybe > PDFBox is not the appropriate lib to use in such case. > Trying to read the same PDF using Tika leads to the same problem: > {code:title=Test.java|borderStyle=solid} > @Test > public void testParsePdf_Content_Memory() throws Exception { > { > InputStream inputStream = new > FileInputStream("c:/tmp/sr2015_mx_clearing_3dot0_mdr2_solution.pdf"); > try { > StringWriter writer = new StringWriter(); > FileWriter fileWriter = new FileWriter(new > File("c:/tmp/test.txt")); > BodyContentHandler handler = new BodyContentHandler(fileWriter); > Metadata metadata = new Metadata(); > new PDFParser().parse(inputStream, handler, metadata, new > ParseContext()); > fileWriter.close(); > } finally { > inputStream.close(); > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)