On 31.01.2024 14:48, Lars Juel Jensen wrote:
This creates another problem for me. I am running PDFBox in a kubernetes
cluster on premises with limited resources. I can not setup persistent
volume claims nor ephemeral volumes, and I can not change how my pods are
started. I have limited resources and an emptyDir that is mounted on /tmp
where the temporary files go. The emptyDir is mapped to a portion of the
kubernetes node's memory, and this memory is shared with many other
services. All in all - I need to keep a very low memory and tempFile
footprint, hence the InputStream. Using RandomAccessReadBuffer with an
InputStream loads the entire PDF into memory, and I can encounter PDF
documents that can be over 1GB in size. So loading everything into memory
is not an option.
You can try to create your own class extending RandomAccessRead.
If your /tmp is mapped on main memory, then it doesn't make sense to use
a temp file at all, you're just wasting time.
Btw PDFBox 2 was also loading the whole PDF file into memory (or into a
scratch file) and had an even bigger footprint because it was also
parsing the complete PDF. So if your project was working with PDFBox 2
then it should work with PDFBox 3.
Tilman
On Wed, Jan 31, 2024 at 10:10 AM Tilman Hausherr <thaush...@t-online.de>
wrote:
On 31.01.2024 09:50, Lars Juel Jensen wrote:
In PDFBox2 I could do:
PDDocument.load(inputStream, MemoryUsageSetting.setupTempFileOnly())
But there is no equivalent to this in PDFBox3. How do I read a PDF from
an
inputstream?
|Loader.loadPDF(new RandomAccessReadBuffer(inputStream),
IOUtils.createTempFileOnlyStreamCache());|
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org