Tested the new snapshot. Performance looks good.

Cache file excerpt:

➜  ~ grep -i NotoSansKannada .pdfbox.cache
*skipexception*|TTF||0|0|0|0|0||/System/Library/Fonts/NotoSansKannada.ttc|b930924c|1700331239000

BR Kjetil

tir. 5. des. 2023 kl. 15:10 skrev Tilman Hausherr <thaush...@t-online.de>:

> Thanks for the feedback. It turns out that there's another error
> (checksum was empty because MessageDigest doesn't support CRC32), which
> has been fixed now, please test again (delete the file first). The
> second-to-last field should now not be empty.
>
> It also teaches an important lesson: a "// never happens" segment should
> have an output.
>
> Tilman
>
> On 05.12.2023 11:34, Kjetil Ødegaard wrote:
> > Nice! Tested it now and I can confirm that it fixes the issue. I see good
> > performance even from the first operation.
> >
> > Checked the cache file and there is a line for this font there now:
> >
> > ➜  ~ grep -i NotoSansKannada .pdfbox.cache
> >
> *skipexception*|TTF||0|0|0|0|0||/System/Library/Fonts/NotoSansKannada.ttc||1700331239000
> >
> > Thanks for the quick response, great work!
> >
> > BR Kjetil
> >
> > tir. 5. des. 2023 kl. 09:55 skrev Tilman Hausherr <thaush...@t-online.de
> >:
> >
> >> Thanks, new snapshot build here:
> >>
> >>
> https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.2-SNAPSHOT/
> >>
> >>
> >> Ticket:
> >> https://issues.apache.org/jira/browse/PDFBOX-5727
> >>
> >> Tilman
> >>
> >> On 05.12.2023 08:41, Kjetil Ødegaard wrote:
> >>> To clarify, this stack trace is not printed anywhere. I got it from
> >>> stepping into the code and invoking printStackTrace() on the exception
> to
> >>> get the whole stack. See complete stack trace below.
> >>>
> >>> I agree with your theory, it matches what I'm seeing. These fonts are
> >> never
> >>> added to the cache file, so the cache file is always rebuilt.
> >>>
> >>> I double checked the cache file again and there is no trace of these
> two
> >>> fonts, but lots of entries for other fonts (of different weights). I
> see
> >>> from the timestamp on the file that it is rebuilt on every run.
> >>>
> >>> BR Kjetil
> >>>
> >>> java.io.EOFException
> >>> at
> >>>
> >>
> org.apache.fontbox.ttf.TTFDataStream.readUnsignedShort(TTFDataStream.java:154)
> >>> at
> >>>
> >>
> org.apache.fontbox.ttf.TTFDataStream.readUnsignedShortArray(TTFDataStream.java:188)
> >>> at
> >>>
> >>
> org.apache.fontbox.ttf.GlyphSubstitutionTable.readMultipleSubstitutionSubtable(GlyphSubstitutionTable.java:412)
> >>> at
> >>>
> >>
> org.apache.fontbox.ttf.GlyphSubstitutionTable.readLookupSubtable(GlyphSubstitutionTable.java:263)
> >>> at
> >>>
> >>
> org.apache.fontbox.ttf.GlyphSubstitutionTable.readLookupTable(GlyphSubstitutionTable.java:313)
> >>> at
> >>>
> >>
> org.apache.fontbox.ttf.GlyphSubstitutionTable.readLookupList(GlyphSubstitutionTable.java:247)
> >>> at
> >>>
> >>
> org.apache.fontbox.ttf.GlyphSubstitutionTable.read(GlyphSubstitutionTable.java:102)
> >>> at org.apache.fontbox.ttf.TrueTypeFont.readTable(TrueTypeFont.java:365)
> >>> at org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:165)
> >>> at org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:144)
> >>> at
> >>>
> >>
> org.apache.fontbox.ttf.TrueTypeCollection.getFontAtIndex(TrueTypeCollection.java:127)
> >>> at
> >>>
> >>
> org.apache.fontbox.ttf.TrueTypeCollection.processAllFonts(TrueTypeCollection.java:109)
> >>> at
> >>>
> >>
> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.addTrueTypeCollection(FileSystemFontProvider.java:665)
> >>> at
> >>>
> >>
> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.scanFonts(FileSystemFontProvider.java:396)
> >>> at
> >>>
> >>
> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider.<init>(FileSystemFontProvider.java:367)
> >>> at
> >>>
> >>
> org.apache.pdfbox.pdmodel.font.FontMapperImpl$DefaultFontProvider.<clinit>(FontMapperImpl.java:139)
> >>> at
> >>>
> >>
> org.apache.pdfbox.pdmodel.font.FontMapperImpl.getProvider(FontMapperImpl.java:158)
> >>> at
> >>>
> >>
> org.apache.pdfbox.pdmodel.font.FontMapperImpl.findFont(FontMapperImpl.java:416)
> >>> at
> >>>
> >>
> org.apache.pdfbox.pdmodel.font.FontMapperImpl.findFontBoxFont(FontMapperImpl.java:379)
> >>> at
> >>>
> >>
> org.apache.pdfbox.pdmodel.font.FontMapperImpl.getFontBoxFont(FontMapperImpl.java:353)
> >>> at
> >> org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:127)
> >>> tir. 5. des. 2023 kl. 05:03 skrev Tilman Hausherr <
> thaush...@t-online.de
> >>> :
> >>>
> >>>> Please do also post the full (for pdfbox / fontbox) stack trace. I
> have
> >>>> a theory why it happens, which is that addTrueTypeCollection() does
> not
> >>>> add the font as "*skipexception*" to the cache file because it's not
> >>>> done in the exception handler.
> >>>>
> >>>> Tilman
> >>>>
> >>>> On 04.12.2023 21:17, Tilman Hausherr wrote:
> >>>>> Does the stack trace appear at every start? If yes then it's a bug.
> >>>>> The intent of the current code is that bad fonts aren't retried. The
> >>>>> font cache file should contain a line with "*skipexception*" for that
> >>>>> font. Can you look at it for the two font files?
> >>>>>
> >>>>> I could change SHA512 to CRC32. It has the advantage that it won't
> >>>>> trigger people who heard about MD5 😂
> >>>>>
> >>>>> I made a test and CRC32 is 20% faster.
> >>>>>
> >>>>> Tilman
> >>>>>
> >>>>> On 04.12.2023 18:48, Gili Tzabari wrote:
> >>>>>> I think the commit contains a typo:
> >>>>>>
> >>>>>>
> >>>>>> 872
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l872
> >>>>>>       private static String computeHash(byte[] ba)
> >>>>>> 873
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l873
> >>>>>>       {
> >>>>>> 874
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l874
> >>>>>>       MessageDigest md;
> >>>>>> 875
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l875
> >>>>>>       try
> >>>>>> 876
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l876
> >>>>>>       {
> >>>>>> 877
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l877
> >>>>>>       md = MessageDigest.getInstance("SHA512");
> >>>>>> 878
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l878
> >>>>>>       byte[] md5 = md.digest(ba);
> >>>>>> 879
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l879
> >>>>>>       return Hex.getString(md5);
> >>>>>> 880
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l880
> >>>>>>       }
> >>>>>> 881
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l881
> >>>>>>       catch (NoSuchAlgorithmException ex)
> >>>>>> 882
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l882
> >>>>>>       {
> >>>>>> 883
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l883
> >>>>>>       // never happens
> >>>>>> 884
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l884
> >>>>>>       return "";
> >>>>>> 885
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l885
> >>>>>>       }
> >>>>>> 886
> >>>>>> <
> >>
> https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/FileSystemFontProvider.java?revision=1912514&view=markup&pathrev=1912514#l886
> >>>>>>       }
> >>>>>>
> >>>>>> You shouldn't need to use SHA512 to detect changes by a
> non-malicious
> >>>>>> actor. MD5 should be plenty, and even CRC32 would be enough. I
> >>>>>> suggest downgrading the hash complexity.
> >>>>>>
> >>>>>> Gili
> >>>>>>
> >>>>>> On 2023-12-04 10:21, Kjetil Ødegaard wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I tried to upgrade an app to PDFBox 3.0.1 and I see a performance
> >>>>>>> issue.
> >>>>>>>
> >>>>>>> It only affects the first PDF operation (after that it's quite
> >>>>>>> fast), but
> >>>>>>> it's a bit annoying since it takes about 20 seconds (on my M1
> >> Macboox).
> >>>>>>> Profiling reveals that this Kotlin code triggers the delay:
> >>>>>>>
> >>>>>>>        val font = PDType1Font(Standard14Fonts.FontName.COURIER)
> >>>>>>>
> >>>>>>> The thread dump shows that almost all time is spent in this method:
> >>>>>>>
> >>>>>>> org.apache.pdfbox.pdmodel.font.FileSystemFontProvider#computeHash
> >>>>>>>
> >>>>>>> I assume that this is related to PDFBOX-5684.
> >>>>>>>
> >>>>>>> Is this possible to work around? Or is it possible to fix?
> >>>>>>>
> >>>>>>> BR Kjetil
> >>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> >>>>> For additional commands, e-mail: users-h...@pdfbox.apache.org
> >>>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> >>>> For additional commands, e-mail: users-h...@pdfbox.apache.org
> >>>>
> >>>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> >> For additional commands, e-mail: users-h...@pdfbox.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>

Reply via email to