Let's say your software runs without hitting the problem. Is there any 
pdfbox*.tmp files left in your temp directory? Then, it would mean you're 
not closing the input files like Andreas suspects. (Or that there is a bug 
in our software that doesn't occur in the build tests)

Tilman



--- Original-Nachricht ---
Von: Andreas Lehmkuehler
Betreff: Re: "Too Many Open Files" IOException in ScratchFile
Datum: 15. März 2023, 8:29
An: users@pdfbox.apache.org




Hi Gilad,

PDFBox is using a scratch file per document as long as you are using
setupTempFileOnly. Handling thousands of documents ends up in thousands of
scratch files. Those scratch files should be closed once the corresponding
documents are closed.

The question is, do you close the input files properly?

Andreas

Am 14.03.23 um 19:16 schrieb Gilad Denneboom:
> Hi Maruan,
>
> Yes, I saw that, but it would be nice if this issue can be solved within
> PDFBox, too.
>
> Gilad
>
> On Tue, Mar 14, 2023 at 4:52 PM Maruan Sahyoun <sahy...@fileaffairs.de
<mailto:sahy...@fileaffairs.de> >
> wrote:
>
>> You can set the ulimit on Linux - Standard is 1024 open files.
>>
>> BR
>> Maruan
>>
>>> Am 14.03.2023 um 16:05 schrieb Gilad Denneboom <
>> gilad.denneb...@gmail.com <mailto:gilad.denneb...@gmail.com> >:
>>>
>>> Hi all,
>>>
>>> I created an application that opens many files (I'm talking thousands),
>>> searching them for specific pages and then merges those pages into new
>> PDF
>>> files. The way I do it is by using the importPage command from the
>> original
>>> files into the split ones.
>>> However, I'm getting an IOException ("Too many open files") from
>>> ScratchFile after several thousands files were processed. I had a look 
at
>>> the source code for that class and I think it might have to do with a
>>> RandomAccessFile variable ("raf") not being properly closed.
>>> All of the documents are opened using MemoryUsageSetting set to
>>> setupTempFileOnly, by the way.
>>> Could someone confirm this is the issue, and maybe help solve it? I'm
>> using
>>> PDFBox 2.0.26, by the way, and the app runs on a Mac.
>>>
>>> The stack-trace:
>>> Exception in thread "main"<http://java.io.IOException> : Too many open 
files
>>> at
<http://java.base/java.io.UnixFileSystem.createFileExclusively0(Native>
>> Method)
>>> at
>>><http://java.base/java.io>
>> .<http://UnixFileSystem.createFileExclusively(UnixFileSystem.java:356> 
)
>>> at<http://java.base/java.io.File.createTempFile(File.java:2179> )
>>> at
<http://org.apache.pdfbox.io.ScratchFile.enlarge(ScratchFile.java:217> )
>>> at
<http://org.apache.pdfbox.io.ScratchFile.getNewPage(ScratchFile.java:167> 
)
>>> at
>>><http://org.apache.pdfbox.io>
>> .<http://ScratchFileBuffer.addPage(ScratchFileBuffer.java:126> )
>>> at<http://org.apache.pdfbox.io.ScratchFileBuffer> .
>> <init>(ScratchFileBuffer.java:84)
>>> at
<http://org.apache.pdfbox.io.ScratchFile.createBuffer(ScratchFile.java:424> 
)
>>> at
>>
<http://org.apache.pdfbox.cos.COSStream.createRaw0utputStream(COSStream.java:273>
 
)
>>> at
>>
<http://org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1140>
 
)
>>> at
>>
<http://org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:929>
 
)
>>> at
>>>
>>
<http://org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:888>
 
)
>>> at
>>>
>>
<http://org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:800>
 
)
>>> at
>>>
>>
<http://org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:760>
 
)
>>> at
<http://org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187> 
)
>>> at
<http://org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226> )
>>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1107)
>>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1090)
>>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1014)
>>> at MergeStudentRecords_
<http://2021.main(MergeStudentRecords_2021.java:324> )
>>>
>>> Thanks in advance!
>>>
>>> Gilad
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
<mailto:users-unsubscr...@pdfbox.apache.org>
>> For additional commands, e-mail: users-h...@pdfbox.apache.org
<mailto:users-h...@pdfbox.apache.org>
>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
<mailto:users-unsubscr...@pdfbox.apache.org>
For additional commands, e-mail: users-h...@pdfbox.apache.org
<mailto:users-h...@pdfbox.apache.org>

Reply via email to