[ 
https://issues.apache.org/jira/browse/PDFBOX-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-6002:
---------------------------------------
    Component/s: Parsing

> change parse methods to take CharSequence argument
> --------------------------------------------------
>
>                 Key: PDFBOX-6002
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-6002
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Parsing
>            Reporter: Axel Howind
>            Assignee: Andreas Lehmkühler
>            Priority: Major
>         Attachments: image-2025-05-02-07-00-52-161.png
>
>
> PDFBox parsing works on Strings in almost all places. Often, StringBuilder 
> instances are created to prepare a fragment to parse, and then another parse 
> method is called using the result of calling toString() on the StringBuilder. 
> If the parse methods were changed to take CharSequence instead, the 
> StringBuilder instance could be passed on without creating a temporary String 
> instance. This would reduce memory consumption and load on the GC.
> I did some profiling using the async profiler, and for example in 
> BaseParser.parseCOSNumber() about 25% of the runtime is spent in 
> StringBuilder().toString() which would be completely eliminated if the parse 
> methods worked on CharSequences instead of Strings (see image):
> !image-2025-05-02-07-00-52-161.png!
> A consequence would be that user code needs to be recompiled (no code changes 
> on the user side) against the new version because the method signature 
> changes.
> An alternative approach is to introduce new methods with the prefix CS, like 
> parseCOSNumberCS(), and to delegate parseCOSNumber() to the new method. This 
> would be a PDFBox 3 compatible change.
> Please let me know if, and if yes, which version of a patch you would 
> possibly accept. I'd then create incremental patches to provide this 
> functionality.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to