Hi,

Thanks for your comments. I have fixed (1) which is an obvious bug. Yes you can make a PR although this is only used indirectly. You can also comment in the JIRA issue
https://issues.apache.org/jira/browse/PDFBOX-4189
if you don't have an account, see https://infra.apache.org/jira-guidelines.html#who  (sorry for the inconvenience)
Tilman

On 16.01.2023 13:36, Vladimir Plizga wrote:
As of v3.0.0-alpha3, the brilliant FontBox library is capable of parsing
many data structures inside OTF/TTF fonts. However, not everything works
smooth:

    1. Unlike other table readers, the
    'org.apache.fontbox.ttf.GlyphSubstitutionTable#read' method does not
    end with 'initialized = true' statement. As a result, all the subsequent
    invocations of 'org.apache.fontbox.ttf.TrueTypeFont#getGsubData' method
    results in GSUB table re-parsing which means it happens at least once on
    the font parsing and then every time the method is called. Is it a bug or
    some unobvious solution?

    2. There are 2 closely related issues around the following behavior:
       1. In order to load GSUB data from a font, the library requires a
       language for which glyph substitutions must be built. The language is
       chosen upon the script tags provided by the font. The mapping between
       script tags and languages is provided by
       'org.apache.fontbox.ttf.model.Language' enumeration which currently
       supports only one language, and it is... Bengali. However, the library
       doesn't communicate the problem to the user directly (including logs).
       Instead, it silently falls back to
       'org.apache.fontbox.ttf.model.GsubData#NO_DATA_FOUND' constant in method
       'org.apache.fontbox.ttf.gsub.GlyphSubstitutionDataExtractor#getGsubData'
       making it quite hard to find out the root cause of the problem. So the
       question is would it be acceptable to throw an exception in such
cases from
       the backward compatibility point of view?

       2. As stated in 'org.apache.fontbox.ttf.model.Language' class's
       JavaDoc, to support a new language one should add a new enumeration item
       and provide corresponding 'org.apache.fontbox.ttf.gsub.GsubWorker'
       implementation. Indeed, the absence of the worker certainly leads to
       UnsupportedOperationException being thrown from
       'org.apache.fontbox.ttf.gsub.GsubWorkerFactory#getGsubWorker' method.
       However, this makes it impossible to even load GsubData from a font which
       is critical for (at least my) FontBox use case. As a possible
solution, I'd
       suggest introducing a default GsubWorker implementation that
would perform
       a no-op substitution (emitting a WARN message into the log for
clarity) for
       any language that is not explicitly supported by the library.
Additionally,
       an 'isStrict' flag with default value 'tue' may be introduced to throw an
       exception instead of falling back to this new default
implementation (much
       like the same named flag in
       'org.apache.fontbox.ttf.TrueTypeFont#getUnicodeCmapLookup(boolean)'
       method). Does it sound reasonable?

P.S. Is there a way to propose the described fixes in the form of a Pull
Request like it is usually done in many open source projects on GitHub?
This would make the discussion much closer to the code and thus
significantly more productive.

Cheers,
Vladimir



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to