That is weird.. The source file I am looking at for version 3.0.1 does not
pass it:
-->
https://github.com/apache/pdfbox/blob/3.0.1/pdfbox/src/main/java/org/apache/pdfbox/pdfparser/PDFParser.java#L91

On Wed, Jan 31, 2024 at 4:57 PM Tilman Hausherr <thaush...@t-online.de>
wrote:

> On 31.01.2024 16:19, Lars Juel Jensen wrote:
> > Well that's my problem.. It works with PDFBox2 with reasonable sized
> files.
> > When it comes to the big ones it crashes.. So reading the migration guide
> > for PDFBox3.0 I thought I saw some light in the tunnel as it says I can
> > create my own reader and stream cache. I see that I can provide my own
> > RandomAccessReader when I call Loader.loadPDF, but the loadPDF method
> that
> > takes a StreamCacheCreate function does not work as promised as the
> > StreamCacheCreateFunction is not passed from PDFParser to COSParser in
> the
> > PDFParser constructor. This works in v3.0.0, but not in v3.0.1. I guess
> > this is a bug?
>
> I don't know if there is a bug, but it is passed:
>
>      public PDFParser(RandomAccessRead source, String
> decryptionPassword, InputStream keyStore,
>              String alias, StreamCacheCreateFunction
> streamCacheCreateFunction) throws IOException
>      {
>          super(source, decryptionPassword, keyStore, alias,
> streamCacheCreateFunction);
>      }
>
> and here's COSParser:
>
>      public COSParser(RandomAccessRead source, String password,
> InputStream keyStore,
>              String keyAlias, StreamCacheCreateFunction
> streamCacheCreateFunction) throws IOException
>      {
>          super(source);
>          this.password = password;
>          this.keyAlias = keyAlias;
>          fileLen = source.length();
>          keyStoreInputStream = keyStore;
>          init(streamCacheCreateFunction);
>      }
>
> If you think 3.0.1 has a bigger memory footprint than 3.0.0, can you
> create a scenario to reproduce this? Preferably without using a container.
>
> Tilman
>
> >
> > On Wed, Jan 31, 2024 at 3:46 PM Tilman Hausherr <thaush...@t-online.de>
> > wrote:
> >
> >> On 31.01.2024 14:48, Lars Juel Jensen wrote:
> >>> This creates another problem for me. I am running PDFBox in a
> kubernetes
> >>> cluster on premises with limited resources. I can not setup persistent
> >>> volume claims nor ephemeral volumes, and I can not change how my pods
> are
> >>> started. I have limited resources and an emptyDir that is mounted on
> /tmp
> >>> where the temporary files go. The emptyDir is mapped to a portion of
> the
> >>> kubernetes node's memory, and this memory is shared with many other
> >>> services. All in all - I need to keep a very low memory and tempFile
> >>> footprint, hence the InputStream. Using RandomAccessReadBuffer with an
> >>> InputStream loads the entire PDF into memory, and I can encounter PDF
> >>> documents that can be over 1GB in size. So loading everything into
> memory
> >>> is not an option.
> >> You can try to create your own class extending RandomAccessRead.
> >>
> >> If your /tmp is mapped on main memory, then it doesn't make sense to use
> >> a temp file at all, you're just wasting time.
> >>
> >> Btw PDFBox 2 was also loading the whole PDF file into memory (or into a
> >> scratch file) and had an even bigger footprint because it was also
> >> parsing the complete PDF. So if your project was working with PDFBox 2
> >> then it should work with PDFBox 3.
> >>
> >> Tilman
> >>
> >>
> >>
> >>> On Wed, Jan 31, 2024 at 10:10 AM Tilman Hausherr <
> thaush...@t-online.de>
> >>> wrote:
> >>>
> >>>> On 31.01.2024 09:50, Lars Juel Jensen wrote:
> >>>>> In PDFBox2 I could do:
> >>>>>
> >>>>> PDDocument.load(inputStream, MemoryUsageSetting.setupTempFileOnly())
> >>>>>
> >>>>> But there is no equivalent to this in PDFBox3. How do I read a PDF
> from
> >>>> an
> >>>>> inputstream?
> >>>>>
> >>>> |Loader.loadPDF(new RandomAccessReadBuffer(inputStream),
> >>>> IOUtils.createTempFileOnlyStreamCache());|
> >>>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> >> For additional commands, e-mail: users-h...@pdfbox.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>

Reply via email to