[ https://issues.apache.org/jira/browse/PDFBOX-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler updated PDFBOX-6002: --------------------------------------- Component/s: Parsing > change parse methods to take CharSequence argument > -------------------------------------------------- > > Key: PDFBOX-6002 > URL: https://issues.apache.org/jira/browse/PDFBOX-6002 > Project: PDFBox > Issue Type: Improvement > Components: Parsing > Reporter: Axel Howind > Assignee: Andreas Lehmkühler > Priority: Major > Attachments: image-2025-05-02-07-00-52-161.png > > > PDFBox parsing works on Strings in almost all places. Often, StringBuilder > instances are created to prepare a fragment to parse, and then another parse > method is called using the result of calling toString() on the StringBuilder. > If the parse methods were changed to take CharSequence instead, the > StringBuilder instance could be passed on without creating a temporary String > instance. This would reduce memory consumption and load on the GC. > I did some profiling using the async profiler, and for example in > BaseParser.parseCOSNumber() about 25% of the runtime is spent in > StringBuilder().toString() which would be completely eliminated if the parse > methods worked on CharSequences instead of Strings (see image): > !image-2025-05-02-07-00-52-161.png! > A consequence would be that user code needs to be recompiled (no code changes > on the user side) against the new version because the method signature > changes. > An alternative approach is to introduce new methods with the prefix CS, like > parseCOSNumberCS(), and to delegate parseCOSNumber() to the new method. This > would be a PDFBox 3 compatible change. > Please let me know if, and if yes, which version of a patch you would > possibly accept. I'd then create incremental patches to provide this > functionality. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org