> The question is, do you close the input files properly? Yes, I do, but only at the very end of the operation, as I was merging all these individual files into one large one, so I had to keep the originals open until I save this merged file for the last time, or it would throw an exception about the PDDocument being closed. I know this is not the best way of merging documents, by the way. I might try to switch to using PDFMergerUtility, instead.
On Wed, Mar 15, 2023 at 8:30 AM Andreas Lehmkuehler <andr...@lehmi.de> wrote: > Hi Gilad, > > PDFBox is using a scratch file per document as long as you are using > setupTempFileOnly. Handling thousands of documents ends up in thousands of > scratch files. Those scratch files should be closed once the corresponding > documents are closed. > > The question is, do you close the input files properly? > > Andreas > > Am 14.03.23 um 19:16 schrieb Gilad Denneboom: > > Hi Maruan, > > > > Yes, I saw that, but it would be nice if this issue can be solved within > > PDFBox, too. > > > > Gilad > > > > On Tue, Mar 14, 2023 at 4:52 PM Maruan Sahyoun <sahy...@fileaffairs.de> > > wrote: > > > >> You can set the ulimit on Linux - Standard is 1024 open files. > >> > >> BR > >> Maruan > >> > >>> Am 14.03.2023 um 16:05 schrieb Gilad Denneboom < > >> gilad.denneb...@gmail.com>: > >>> > >>> Hi all, > >>> > >>> I created an application that opens many files (I'm talking thousands), > >>> searching them for specific pages and then merges those pages into new > >> PDF > >>> files. The way I do it is by using the importPage command from the > >> original > >>> files into the split ones. > >>> However, I'm getting an IOException ("Too many open files") from > >>> ScratchFile after several thousands files were processed. I had a look > at > >>> the source code for that class and I think it might have to do with a > >>> RandomAccessFile variable ("raf") not being properly closed. > >>> All of the documents are opened using MemoryUsageSetting set to > >>> setupTempFileOnly, by the way. > >>> Could someone confirm this is the issue, and maybe help solve it? I'm > >> using > >>> PDFBox 2.0.26, by the way, and the app runs on a Mac. > >>> > >>> The stack-trace: > >>> Exception in thread "main" java.io.IOException: Too many open files > >>> at java.base/java.io.UnixFileSystem.createFileExclusively0(Native > >> Method) > >>> at > >>> java.base/java.io > >> .UnixFileSystem.createFileExclusively(UnixFileSystem.java:356) > >>> at java.base/java.io.File.createTempFile(File.java:2179) > >>> at org.apache.pdfbox.io.ScratchFile.enlarge(ScratchFile.java:217) > >>> at org.apache.pdfbox.io.ScratchFile.getNewPage(ScratchFile.java:167) > >>> at > >>> org.apache.pdfbox.io > >> .ScratchFileBuffer.addPage(ScratchFileBuffer.java:126) > >>> at org.apache.pdfbox.io.ScratchFileBuffer. > >> <init>(ScratchFileBuffer.java:84) > >>> at org.apache.pdfbox.io.ScratchFile.createBuffer(ScratchFile.java:424) > >>> at > >> > org.apache.pdfbox.cos.COSStream.createRaw0utputStream(COSStream.java:273) > >>> at > >> > org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1140) > >>> at > >> > org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:929) > >>> at > >>> > >> > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:888) > >>> at > >>> > >> > org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:800) > >>> at > >>> > >> > org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:760) > >>> at > org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187) > >>> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226) > >>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1107) > >>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1090) > >>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1014) > >>> at MergeStudentRecords_2021.main(MergeStudentRecords_2021.java:324) > >>> > >>> Thanks in advance! > >>> > >>> Gilad > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > >> For additional commands, e-mail: users-h...@pdfbox.apache.org > >> > >> > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org > >