Axel Howind created PDFBOX-6002: ----------------------------------- Summary: change parse methods to take CharSequence argument Key: PDFBOX-6002 URL: https://issues.apache.org/jira/browse/PDFBOX-6002 Project: PDFBox Issue Type: Improvement Reporter: Axel Howind Attachments: image-2025-05-02-07-00-52-161.png
PDFBox parsing works on Strings in almost all places. Often, StringBuilder instances are created to prepare a fragment to parse, and then another parse method is called using the result of calling toString() on the StringBuilder. If the parse methods were changed to take CharSequence instead, the StringBuilder instance could be passed on without creating a temporary String instance. This would reduce memory consumption and load on the GC. I did some profiling using the async profiler, and for example in BaseParser.parseCOSNumber() about 25% of the runtime is spent in StringBuilder().toString() which would be completely eliminated if the parse methods worked on CharSequences instead of Strings (see image): !image-2025-05-02-07-00-52-161.png! A consequence would be that user code needs to be recompiled (no code changes on the user side) against the new version because the method signature changes. An alternative approach is to introduce new methods with the prefix CS, like parseCOSNumberCS(), and to delegate parseCOSNumber() to the new method. This would be a PDFBox 3 compatible change. Please let me know if, and if yes, which version of a patch you would possibly accept. I'd then create incremental patches to provide this functionality. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org