Re: "Too Many Open Files" IOException in ScratchFile

Andreas Lehmkuehler Thu, 16 Mar 2023 23:51:31 -0700

Am 15.03.23 um 17:51 schrieb Gilad Denneboom:

It's a bit more complicated than that. I have a small set of very large
files with different pages matching different people. I need to match those
pages based on some identifying code, and then extract them into either
individual files (one per person) or a single merged file with those pages
sorted by person. But yes, I do close the input files after scanning them,
and then open them later on to extract the relevant pages from them, if
needed. This is actually the reason I opted not to use PDFMergerUtility, as
it would require me to extract all the individual pages as separate files,
so I could merge them later on (as it's not possible to use it to only
merge parts of files).

How about extracting those pages using the splitter? This will produce the fileper person you are looking for. Use the merger to get the summary file. If thereare to many files use several steps to do the merge.


Andreas


On Wed, Mar 15, 2023 at 5:28 PM Tilman Hausherr <thaush...@t-online.de>
wrote:

Your text sounded like you're not picking stuff from all documents. Are
you closing the documents where nothing is found at the earliest possble
time?
Tilman

On 15.03.2023 17:21, Gilad Denneboom wrote:

The question is, do you close the input files properly?

Yes, I do, but only at the very end of the operation, as I was merging

all

these individual files into one large one, so I had to keep the originals
open until I save this merged file for the last time, or it would throw

an

exception about the PDDocument being closed.
I know this is not the best way of merging documents, by the way. I might
try to switch to using PDFMergerUtility, instead.

On Wed, Mar 15, 2023 at 8:30 AM Andreas Lehmkuehler <andr...@lehmi.de>
wrote:

Hi Gilad,

PDFBox is using a scratch file per document as long as you are using
setupTempFileOnly. Handling thousands of documents ends up in thousands

of

scratch files. Those scratch files should be closed once the

corresponding

documents are closed.

The question is, do you close the input files properly?

Andreas

Am 14.03.23 um 19:16 schrieb Gilad Denneboom:

Hi Maruan,

Yes, I saw that, but it would be nice if this issue can be solved

within

PDFBox, too.

Gilad

On Tue, Mar 14, 2023 at 4:52 PM Maruan Sahyoun <sahy...@fileaffairs.de

wrote:

You can set the ulimit on Linux - Standard is 1024 open files.

BR
Maruan

Am 14.03.2023 um 16:05 schrieb Gilad Denneboom <

gilad.denneb...@gmail.com>:

Hi all,

I created an application that opens many files (I'm talking

thousands),

searching them for specific pages and then merges those pages into

new

PDF

files. The way I do it is by using the importPage command from the

original

files into the split ones.
However, I'm getting an IOException ("Too many open files") from
ScratchFile after several thousands files were processed. I had a

look

at

the source code for that class and I think it might have to do with a
RandomAccessFile variable ("raf") not being properly closed.
All of the documents are opened using MemoryUsageSetting set to
setupTempFileOnly, by the way.
Could someone confirm this is the issue, and maybe help solve it? I'm

using

PDFBox 2.0.26, by the way, and the app runs on a Mac.

The stack-trace:
Exception in thread "main" java.io.IOException: Too many open files
at java.base/java.io.UnixFileSystem.createFileExclusively0(Native

Method)

at
java.base/java.io

.UnixFileSystem.createFileExclusively(UnixFileSystem.java:356)

at java.base/java.io.File.createTempFile(File.java:2179)
at org.apache.pdfbox.io.ScratchFile.enlarge(ScratchFile.java:217)
at org.apache.pdfbox.io.ScratchFile.getNewPage(ScratchFile.java:167)
at
org.apache.pdfbox.io

.ScratchFileBuffer.addPage(ScratchFileBuffer.java:126)

at org.apache.pdfbox.io.ScratchFileBuffer.

<init>(ScratchFileBuffer.java:84)

at org.apache.pdfbox.io

.ScratchFile.createBuffer(ScratchFile.java:424)

at

org.apache.pdfbox.cos.COSStream.createRaw0utputStream(COSStream.java:273)

at

org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1140)

at

org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:929)

at

org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:888)

at

org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:800)

at

org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:760)

at

org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187)

at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1107)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1090)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1014)
at MergeStudentRecords_2021.main(MergeStudentRecords_2021.java:324)

Thanks in advance!

Gilad

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: "Too Many Open Files" IOException in ScratchFile

Reply via email to