[ 
https://issues.apache.org/jira/browse/PDFBOX-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17724483#comment-17724483
 ] 

Tilman Hausherr edited comment on PDFBOX-5606 at 5/20/23 8:18 AM:
------------------------------------------------------------------

Please submit java code that reproduces an out of memory situation. At this 
time, all your chart shows is a different gc behavior, and all my tests show is 
that 2.0.28 uses 1 MB more but does the job with 15 MB, which is pretty low 
(compared to what we need for rendering). It's unclear if you're alleging that 
your memory runs out after using 17 GB, or if you're just having fear because 
of seeing that chart.


was (Author: tilman):
Please submit java code that reproduce an out of memory situation. At this 
time, all your chart shows is a different gc behavior, and all my tests show is 
that 2.0.28 uses 1 MB more but does the job with 15 MB, which is pretty low 
(compared to what we need for rendering). It's unclear if you're alleging that 
your memory runs out after using 17 GB, or if you're just having fear because 
of seeing that chart.

> PDFTextStripper runs out of memory in 2.0.28 but not in 2.0.27 same code
> ------------------------------------------------------------------------
>
>                 Key: PDFBOX-5606
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5606
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.28
>            Reporter: Joe Li
>            Priority: Major
>              Labels: memory-bug
>         Attachments: 590031dc-2131-4a00-a936-d1175b7b926c.pdf, 
> pdfbox-2.0.27.png, pdfbox-2.0.28.png
>
>
> Given the follwing simplified Groovy code (for succinctness over Java)
>  
> {code:java}
> // Groovy 4.0.12
> import org.apache.pdfbox.pdmodel.PDDocument
> import org.apache.pdfbox.pdmodel.PDPage
> import org.apache.pdfbox.text.PDFTextStripperByArea
> import java.awt.geom.Rectangle2D
> int GRID_WIDTH = 10
> int GRID_HEIGHT = 10
> PDDocument.load(new File('./test.pdf')).withCloseable { doc ->
>     doc.pages.eachWithIndex { PDPage page, int pageIndex ->
>         int rows = Math.ceil((page.mediaBox.height as int) /GRID_HEIGHT)
>         int columns = Math.ceil((page.mediaBox.width as int) /GRID_WIDTH)
>         println "processing page $pageIndex, rows = $rows, columns = $columns"
>         def rectangles = [:]
>         (0..<rows).each {rowIndex ->
>             (0..<columns).each { colIndex ->
>                 rectangles["${rowIndex * columns + colIndex}"] = new 
> Rectangle2D.Float(colIndex * GRID_WIDTH, rowIndex * GRID_HEIGHT, GRID_WIDTH, 
> GRID_HEIGHT)
>             }
>         }
>         rectangles.each { key, rect ->
>             PDFTextStripperByArea textStripper = new PDFTextStripperByArea()
>             textStripper.addRegion(key, rect)
>             textStripper.extractRegions(page)
>         }
>     }
> }{code}
>  
>  
> PDFBox version 2.0.28 uses ever increasing memory, but version 2.0.27 does 
> not. 
> The test.pdf file I am using can be downloaded from Apple SEC filings page, 
> `8-K` from [https://investor.apple.com/sec-filings/default.aspx], but any 10+ 
> page pdf with a lot of text will work. 
> I have attached profiler screenshots of the difference. 
> Thanks in advance for your help. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to