[ https://issues.apache.org/jira/browse/PDFBOX-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kabir Soneja updated PDFBOX-6010: --------------------------------- Description: Hi, I am working on extracting images from a PDF using pdfbox version 2.0.34. While doing so we have our own recursive logic to recurse through all PDResources for each page and within each page we check for all the objects to filter out images. This recursive logic has a max depth of 25 to avoid infinite recursion. When trying out the image extraction for the same PDF using the CLI, the image is extracted within a second indicating that the image extraction logic within the pdfbox source code is handling image extraction using an ImageGraphicsEngine defined within the source code. Can you help me understand: * To handle image extraction, are there are any API directly provided by PDFBox? * Is there any way to reuse the image extraction logic within the source code i.e is it exposed as a public API? * Any other suggestions to handle image extraction gracefully with/without recursion? was: Hi, I am working on extracting images from a PDF using pdfbox version 2.0.34. While doing so we have our own recursive logic to recurse through all PDResources for each page and within each page we check for all the objects to filter out images. This recursive logic has a max depth of 25 to avoid infinite recursion. When trying out the image extraction for the same PDF using the CLI, the image is extracted within a second indicating that the image extraction logic within the pdfbox source code is handling image extraction using an ImageGraphicsEngine defined within the source code. * To handle image extraction, are there are any API directly provided by PDFBox? * Is there any way to reuse the image extraction logic within the source code i.e is it exposed as a public API? * Any other suggestions to handle image extraction gracefully with/without recursion? > PDF Image Extraction resulting in an infinite recursion > ------------------------------------------------------- > > Key: PDFBOX-6010 > URL: https://issues.apache.org/jira/browse/PDFBOX-6010 > Project: PDFBox > Issue Type: Bug > Reporter: Kabir Soneja > Priority: Major > > Hi, > I am working on extracting images from a PDF using pdfbox version 2.0.34. > While doing so we have our own recursive logic to recurse through all > PDResources for each page and within each page we check for all the objects > to filter out images. This recursive logic has a max depth of 25 to avoid > infinite recursion. > When trying out the image extraction for the same PDF using the CLI, the > image is extracted within a second indicating that the image extraction logic > within the pdfbox source code is handling image extraction using an > ImageGraphicsEngine defined within the source code. > Can you help me understand: > * To handle image extraction, are there are any API directly provided by > PDFBox? > * Is there any way to reuse the image extraction logic within the source > code i.e is it exposed as a public API? > * Any other suggestions to handle image extraction gracefully with/without > recursion? -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org