[
https://issues.apache.org/jira/browse/PDFBOX-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18084725#comment-18084725
]
Volker Kunert commented on PDFBOX-4951:
---------------------------------------
1 Could it be possible to just pass the processor, and hide this double loading
of the font in the pdfbox library code? I'm trying to make it simple for the
users. Ideally stuff should "just work". If we load a font twice it may happen
that people load a different font.
I am not sure to what the processor should be passed.
Currently we have the following interface:
PDType0Font font = glyphLayoutProcessorAwt.loadFont(pdDocument, inputStream,
embedSubset, fontOptions);
It is simple, works, and the double loading is hidden.
Behind the scenes the call is delegated from glyphLayoutProcessorAwt
to glyphLayoutFontLoader, that loads the PDFBOX-font and the AWT-font
and populates the mapping table.
2 Lets say people want a different processor. Could you create an
FontProcessorInterface that has the minimal features?
Can be done: e.g. GlyphLayoutProcessor as interface, GlyphLayoutProcessorAwt as
implementation
3 Would this extra loading of the font work with ttc files?
This is not implemented, it could look like
PDType0Font[] fontList = glyphLayoutProcessorAwt.loadFontTtc(pdDocument,
inputStream, embedSubset, fontOptions);
Not sure if it is worth the effort.
4 Is it correct that arabic works nicely by default but bengali needs
activation of ligatures?
No, no activation of ligatures is necessary
5 please replace <p/> with <p>
OK
6 showGlyphsWithPositioning() is package local, isn't it? Or do we want to make
this accessible for people who make their own?
OK, I will make it public
7 "greek letters extended" looks like this: Ά Έ Ή Ί Ό
This is correct, the list starts with GREEK CAPITAL LETTER ALPHA WITH TONOS
etc. See also Table Greek letters (gl) in
https://en.wikipedia.org/wiki/DIN_91379
8 can we get rid of "showTextPDType0Font(GlyphVector glyphVector, ..." and keep
this to GlyphLayoutProcessor? This way we won't have awt in
PDAbstractContentStream.
I can move it to GlyphLayoutProcessorAwt
9 I looked at my old texts here, so it seems I changed my mind from 2020. What
I remember is that it was too much work, and then I didn't have enough free
time. However I see I noticed the double loading of the font.
10 Would this work with Thai?
The example for PDFBOX-3147 seems to be OK.
!example-PDFBOX-3147-NotoSansThaiLooped-Regular.png!
11 ifMixedThenDivideTextAndShow() should be split so that it either does
something, or returns something. Currently it does both and this is confusing.
I can change this, do the Bidi-Test in showText and call a new
method showTextUni(part) for each uni-directional part
12 Assuming you know a lot about this part of awt, does it also contain
something for vertical fonts? We use GSUB for replacement of vertical glyphs.
Sorry, I don't know much about AWT, I did not find options for vertical
fonts. Java has the HarfBuzz lib embedded, but restricts usage, only some
features are available.
13 I'll run another copilot review later which is likely to be uncomfortable.
No problem
14 If we take this (I like it and we have often been asked about complex
scripts) we may need an ICLA
This is OK.
> Sequences of DIN SPEC 91379 with combining letters are rendered incorrectly
> ---------------------------------------------------------------------------
>
> Key: PDFBOX-4951
> URL: https://issues.apache.org/jira/browse/PDFBOX-4951
> Project: PDFBox
> Issue Type: Bug
> Components: Rendering
> Affects Versions: 2.0.21
> Reporter: Volker Kunert
> Priority: Major
> Attachments: DIN_SPEC_91379_Sequences-aa.pdf,
> DIN_SPEC_91379_Sequences-ab.pdf, DIN_SPEC_91379_Sequences-ac.pdf,
> DIN_SPEC_91379_Sequences.txt, DefaultScriptProcessor.java, DejaVuSans.ttf,
> DoGlyphLayoutBidi.pdf, DoGlyphLayoutDinSpec91379.pdf,
> DoGlyphLayoutDinSpec91379Form.pdf, DoGlyphPositionBengali.pdf,
> ExamplePdfboxFopPos-By-Tilman.pdf, ExamplePdfboxFopPos.java,
> ExamplePdfboxFopPos.pdf, ExamplePdfboxFopPosForm.java,
> ExamplePdfboxFopPosForm.pdf, FiraCode-Regular.ttf,
> FontForge-Lohit-Bengali.png, TestPdfbox.java, TestPdfboxFop2.java,
> TestPdfboxFop2.pdf, TestPdfboxJava2D.java, TestPdfboxJava2D.pdf, bidi-1.png,
> bidi-2.png, bidi.png, example-PDFBOX-3147-NotoSansThaiLooped-Regular.png,
> image-2026-05-23-16-16-53-442.png, image-2026-05-23-16-17-28-172.png,
> image-2026-05-26-16-49-45-529.png, ligatures-kerning.png,
> patch-2020-10-02.txt, pdfbox.patch, pdfbox.pdf, screenshot-1.png
>
>
> Accented Letters composed of Unicode base letter and combining accent are
> rendered wrong. E.g. with 0041 030B LATIN CAPITAL LETTER A WITH COMBINING
> DOUBLE ACUTE ACCENT the accent appears at the right hand side of the letter
> A, not above the letter A.
> The position is wrong for most of the sequences defined in the following spec:
> DIN SPEC 91379: Characters in Unicode for the electronic processing of names
> and data
> exchange in Europe; with digital attachment
> [https://www.xoev.de/downloads-2316#StringLatin]
> [https://www.din.de/de/wdc-beuth:din21:301228458]
>
> The correct rendering should look like the output of hb-view 2.6.8, see files
> DIN_SPEC_91379_Sequences*.pdf.
> The output of PDFBox is appended in pdfbox.pdf, which is created by running
> TestPdfbox.java. The sequences are read from file
> DIN_SPEC_91379_Sequences.txt.
>
> Font used for testing: NotoSansMono-Regular.ttf, see
> [https://www.google.com/get/noto/]
> download:
> [https://noto-website-2.storage.googleapis.com/pkgs/NotoSansMono-hinted.zip]
> See also FOP-2969
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]