Tested the new snapshot. Performance looks good. Cache file excerpt:
➜ ~ grep -i NotoSansKannada .pdfbox.cache *skipexception*|TTF||0|0|0|0|0||/System/Library/Fonts/NotoSansKannada.ttc|b930924c|1700331239000 BR Kjetil tir. 5. des. 2023 kl. 15:10 skrev Tilman Hausherr <thaush...@t-online.de>: > Thanks for the feedback. It turns out that there's another error > (checksum was empty because MessageDigest doesn't support CRC32), which > has been fixed now, please test again (delete the file first). The > second-to-last field should now not be empty. > > It also teaches an important lesson: a "// never happens" segment should > have an output. > > Tilman > > On 05.12.2023 11:34, Kjetil Ødegaard wrote: > > Nice! Tested it now and I can confirm that it fixes the issue. I see good > > performance even from the first operation. > > > > Checked the cache file and there is a line for this font there now: > > > > ➜ ~ grep -i NotoSansKannada .pdfbox.cache > > > *skipexception*|TTF||0|0|0|0|0||/System/Library/Fonts/NotoSansKannada.ttc||1700331239000 > > > > Thanks for the quick response, great work! > > > > BR Kjetil > > > > tir. 5. des. 2023 kl. 09:55 skrev Tilman Hausherr <thaush...@t-online.de > >: > > > >> Thanks, new snapshot build here: > >> > >> > https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.2-SNAPSHOT/ > >> > >> > >> Ticket: > >> https://issues.apache.org/jira/browse/PDFBOX-5727 > >> > >> Tilman > >> > >> On 05.12.2023 08:41, Kjetil Ødegaard wrote: > >>> To clarify, this stack trace is not printed anywhere. I got it from > >>> stepping into the code and invoking printStackTrace() on the exception > to > >>> get the whole stack. See complete stack trace below. > >>> > >>> I agree with your theory, it matches what I'm seeing. These fonts are > >> never > >>> added to the cache file, so the cache file is always rebuilt. > >>> > >>> I double checked the cache file again and there is no trace of these > two > >>> fonts, but lots of entries for other fonts (of different weights). I > see > >>> from the timestamp on the file that it is rebuilt on every run. > >>> > >>> BR Kjetil > >>> > >>> java.io.EOFException > >>> at > >>> > >> > org.apache.fontbox.ttf.TTFDataStream.readUnsignedShort(TTFDataStream.java:154) > >>> at > >>> > >> > org.apache.fontbox.ttf.TTFDataStream.readUnsignedShortArray(TTFDataStream.java:188) > >>> at > >>> > >> > org.apache.fontbox.ttf.GlyphSubstitutionTable.readMultipleSubstitutionSubtable(GlyphSubstitutionTable.java:412) > >>> at > >>> > >> > org.apache.fontbox.ttf.GlyphSubstitutionTable.readLookupSubtable(GlyphSubstitutionTable.java:263) > >>> at > >>> > >> > org.apache.fontbox.ttf.GlyphSubstitutionTable.readLookupTable(GlyphSubstitutionTable.java:313) > >>> at > >>> > >> > org.apache.fontbox.ttf.GlyphSubstitutionTable.readLookupList(GlyphSubstitutionTable.java:247) > >>> at > >>> > >> > org.apache.fontbox.ttf.GlyphSubstitutionTable.read(GlyphSubstitutionTable.java:102) > >>> at org.apache.fontbox.ttf.TrueTypeFont.readTable(TrueTypeFont.java:365) > >>> at org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:165) > >>> at org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:144) > >>> at > >>> > >> > org.apache.fontbox.ttf.TrueTypeCollection.getFontAtIndex(TrueTypeCollection.java:127) > >>> at > >>> > >> > org.apache.fontbox.ttf.TrueTypeCollection.processAllFonts(TrueTypeCollection.java:109) > >>> at > >>> > >> > org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.addTrueTypeCollection(FileSystemFontProvider.java:665) > >>> at > >>> > >> > org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.scanFonts(FileSystemFontProvider.java:396) > >>> at > >>> > >> > org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.<init>(FileSystemFontProvider.java:367) > >>> at > >>> > >> > org.apache.pdfbox.pdmodel.font.FontMapperImpl$DefaultFontProvider.<clinit>(FontMapperImpl.java:139) > >>> at > >>> > >> > org.apache.pdfbox.pdmodel.font.FontMapperImpl.getProvider(FontMapperImpl.java:158) > >>> at > >>> > >> > org.apache.pdfbox.pdmodel.font.FontMapperImpl.findFont(FontMapperImpl.java:416) > >>> at > >>> > >> > org.apache.pdfbox.pdmodel.font.FontMapperImpl.findFontBoxFont(FontMapperImpl.java:379) > >>> at > >>> > >> > org.apache.pdfbox.pdmodel.font.FontMapperImpl.getFontBoxFont(FontMapperImpl.java:353) > >>> at > >> org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:127) > >>> tir. 5. des. 2023 kl. 05:03 skrev Tilman Hausherr < > thaush...@t-online.de > >>> : > >>> > >>>> Please do also post the full (for pdfbox / fontbox) stack trace. I > have > >>>> a theory why it happens, which is that addTrueTypeCollection() does > not > >>>> add the font as "*skipexception*" to the cache file because it's not > >>>> done in the exception handler. > >>>> > >>>> Tilman > >>>> > >>>> On 04.12.2023 21:17, Tilman Hausherr wrote: > >>>>> Does the stack trace appear at every start? If yes then it's a bug. > >>>>> The intent of the current code is that bad fonts aren't retried. The > >>>>> font cache file should contain a line with "*skipexception*" for that > >>>>> font. Can you look at it for the two font files? > >>>>> > >>>>> I could change SHA512 to CRC32. It has the advantage that it won't > >>>>> trigger people who heard about MD5 😂 > >>>>> > >>>>> I made a test and CRC32 is 20% faster. > >>>>> > >>>>> Tilman > >>>>> > >>>>> On 04.12.2023 18:48, Gili Tzabari wrote: > >>>>>> I think the commit contains a typo: > >>>>>> > >>>>>> > >>>>>> 872 > >>>>>> < > >> > https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l872 > >>>>>> private static String computeHash(byte[] ba) > >>>>>> 873 > >>>>>> < > >> > https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l873 > >>>>>> { > >>>>>> 874 > >>>>>> < > >> > https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l874 > >>>>>> MessageDigest md; > >>>>>> 875 > >>>>>> < > >> > https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l875 > >>>>>> try > >>>>>> 876 > >>>>>> < > >> > https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l876 > >>>>>> { > >>>>>> 877 > >>>>>> < > >> > https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l877 > >>>>>> md = MessageDigest.getInstance("SHA512"); > >>>>>> 878 > >>>>>> < > >> > https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l878 > >>>>>> byte[] md5 = md.digest(ba); > >>>>>> 879 > >>>>>> < > >> > https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l879 > >>>>>> return Hex.getString(md5); > >>>>>> 880 > >>>>>> < > >> > https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l880 > >>>>>> } > >>>>>> 881 > >>>>>> < > >> > https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l881 > >>>>>> catch (NoSuchAlgorithmException ex) > >>>>>> 882 > >>>>>> < > >> > https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l882 > >>>>>> { > >>>>>> 883 > >>>>>> < > >> > https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l883 > >>>>>> // never happens > >>>>>> 884 > >>>>>> < > >> > https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l884 > >>>>>> return ""; > >>>>>> 885 > >>>>>> < > >> > https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l885 > >>>>>> } > >>>>>> 886 > >>>>>> < > >> > https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l886 > >>>>>> } > >>>>>> > >>>>>> You shouldn't need to use SHA512 to detect changes by a > non-malicious > >>>>>> actor. MD5 should be plenty, and even CRC32 would be enough. I > >>>>>> suggest downgrading the hash complexity. > >>>>>> > >>>>>> Gili > >>>>>> > >>>>>> On 2023-12-04 10:21, Kjetil Ødegaard wrote: > >>>>>>> Hi, > >>>>>>> > >>>>>>> I tried to upgrade an app to PDFBox 3.0.1 and I see a performance > >>>>>>> issue. > >>>>>>> > >>>>>>> It only affects the first PDF operation (after that it's quite > >>>>>>> fast), but > >>>>>>> it's a bit annoying since it takes about 20 seconds (on my M1 > >> Macboox). > >>>>>>> Profiling reveals that this Kotlin code triggers the delay: > >>>>>>> > >>>>>>> val font = PDType1Font(Standard14Fonts.FontName.COURIER) > >>>>>>> > >>>>>>> The thread dump shows that almost all time is spent in this method: > >>>>>>> > >>>>>>> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider#computeHash > >>>>>>> > >>>>>>> I assume that this is related to PDFBOX-5684. > >>>>>>> > >>>>>>> Is this possible to work around? Or is it possible to fix? > >>>>>>> > >>>>>>> BR Kjetil > >>>>>>> > >>>>> --------------------------------------------------------------------- > >>>>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > >>>>> For additional commands, e-mail: users-h...@pdfbox.apache.org > >>>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > >>>> For additional commands, e-mail: users-h...@pdfbox.apache.org > >>>> > >>>> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > >> For additional commands, e-mail: users-h...@pdfbox.apache.org > >> > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org > >