That is weird.. The source file I am looking at for version 3.0.1 does not pass it: --> https://github.com/apache/pdfbox/blob/3.0.1/pdfbox/src/main/java/org/apache/pdfbox/pdfparser/PDFParser.java#L91
On Wed, Jan 31, 2024 at 4:57 PM Tilman Hausherr <thaush...@t-online.de> wrote: > On 31.01.2024 16:19, Lars Juel Jensen wrote: > > Well that's my problem.. It works with PDFBox2 with reasonable sized > files. > > When it comes to the big ones it crashes.. So reading the migration guide > > for PDFBox3.0 I thought I saw some light in the tunnel as it says I can > > create my own reader and stream cache. I see that I can provide my own > > RandomAccessReader when I call Loader.loadPDF, but the loadPDF method > that > > takes a StreamCacheCreate function does not work as promised as the > > StreamCacheCreateFunction is not passed from PDFParser to COSParser in > the > > PDFParser constructor. This works in v3.0.0, but not in v3.0.1. I guess > > this is a bug? > > I don't know if there is a bug, but it is passed: > > public PDFParser(RandomAccessRead source, String > decryptionPassword, InputStream keyStore, > String alias, StreamCacheCreateFunction > streamCacheCreateFunction) throws IOException > { > super(source, decryptionPassword, keyStore, alias, > streamCacheCreateFunction); > } > > and here's COSParser: > > public COSParser(RandomAccessRead source, String password, > InputStream keyStore, > String keyAlias, StreamCacheCreateFunction > streamCacheCreateFunction) throws IOException > { > super(source); > this.password = password; > this.keyAlias = keyAlias; > fileLen = source.length(); > keyStoreInputStream = keyStore; > init(streamCacheCreateFunction); > } > > If you think 3.0.1 has a bigger memory footprint than 3.0.0, can you > create a scenario to reproduce this? Preferably without using a container. > > Tilman > > > > > On Wed, Jan 31, 2024 at 3:46 PM Tilman Hausherr <thaush...@t-online.de> > > wrote: > > > >> On 31.01.2024 14:48, Lars Juel Jensen wrote: > >>> This creates another problem for me. I am running PDFBox in a > kubernetes > >>> cluster on premises with limited resources. I can not setup persistent > >>> volume claims nor ephemeral volumes, and I can not change how my pods > are > >>> started. I have limited resources and an emptyDir that is mounted on > /tmp > >>> where the temporary files go. The emptyDir is mapped to a portion of > the > >>> kubernetes node's memory, and this memory is shared with many other > >>> services. All in all - I need to keep a very low memory and tempFile > >>> footprint, hence the InputStream. Using RandomAccessReadBuffer with an > >>> InputStream loads the entire PDF into memory, and I can encounter PDF > >>> documents that can be over 1GB in size. So loading everything into > memory > >>> is not an option. > >> You can try to create your own class extending RandomAccessRead. > >> > >> If your /tmp is mapped on main memory, then it doesn't make sense to use > >> a temp file at all, you're just wasting time. > >> > >> Btw PDFBox 2 was also loading the whole PDF file into memory (or into a > >> scratch file) and had an even bigger footprint because it was also > >> parsing the complete PDF. So if your project was working with PDFBox 2 > >> then it should work with PDFBox 3. > >> > >> Tilman > >> > >> > >> > >>> On Wed, Jan 31, 2024 at 10:10 AM Tilman Hausherr < > thaush...@t-online.de> > >>> wrote: > >>> > >>>> On 31.01.2024 09:50, Lars Juel Jensen wrote: > >>>>> In PDFBox2 I could do: > >>>>> > >>>>> PDDocument.load(inputStream, MemoryUsageSetting.setupTempFileOnly()) > >>>>> > >>>>> But there is no equivalent to this in PDFBox3. How do I read a PDF > from > >>>> an > >>>>> inputstream? > >>>>> > >>>> |Loader.loadPDF(new RandomAccessReadBuffer(inputStream), > >>>> IOUtils.createTempFileOnlyStreamCache());| > >>>> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > >> For additional commands, e-mail: users-h...@pdfbox.apache.org > >> > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org > >