At 2023-03-18T11:09:16+0000, Ralph Corderoy wrote: > > The encoding of choice would probably be ISO 8859-7 in order to > > remain within the 8 bit character encoding space. > ... > > 4. Write your documents in ISO 8859-7 or convert them from Unicode > > to ISO 8859-7 > > I'd recommend your second option; that Mortadelas writes in UTF-8 and > uses preconv(1).
I too second this. We can add support for ISO 8859 Latin/Greek, but an important question to address is whether composers of modern Greek documents need that, or whether the false belief that "groff has no UTF-8" support is misleading people into thinking that using an 8-bit character set is the only way they can get their language supported in groff. > > 2. Localize necessary strings (like "abstract", "contents", days of > > the week etc.) > > This may not be needed, e.g. if a macro set isn't being used. Also true. > > > P.S: This > > > <https://www.reddit.com/r/groff/comments/112tfqv/support_for_greek_in_groff/> > > > is the post I am referring to. > > For others who may reply, this thread is worth a read to see what's > already been suggested to Mortadelas. We seem to be missing some support for recombining characters on typesetting devices after decomposing them. For example, I prepared the following document. [UTF-8 follows] $ cat ATTIC/sample-greek.ms .NH 1 Δεκεμβριανά .LP Η έναρξή τους, στις 3 Δεκεμβρίου του 1944, σηματοδοτείται από τους πυροβολισμούς των Αστυνομικών δυνάμεων μπροστά στο μνημείο του άγνωστου στρατιώτη ενάντια στη διαδήλωση του ΕΑΜ, που είχε οργανωθεί ως απάντηση στο τελεσίγραφο της κυβέρνησης εθνικής ενότητας (1-12-1944) για τον αφοπλισμό όλων των αντάρτικων ομάδων, με αποτέλεσμα το θάνατο 33 διαδηλωτών και τον τραυματισμό άλλων 148. Παράλληλα ο στρατηγός Σκόμπυ προέβη σε διάγγελμα, ενώ άμεσες προσπάθειες για πολιτική λύση απαγορεύτηκαν από τον Τσώρτσιλ. This renders fine to a UTF-8 terminal with groff 1.22.4: $ groff -k -ms -Tutf8 ATTIC/sample-greek.ms | cat -s 1. Δεκεμβριανά Η έναρξή τους, στις 3 Δεκεμβρίου του 1944, σηματοδοτείται από τους πυροβολισμούς των Αστυνομικών δυνάμεων μπροστά στο μνημείο του άγνωστου στρατιώτη ενάντια στη διαδήλωση του ΕΑΜ, που είχε οργανωθεί ως απάντηση στο τελεσίγραφο της κυβέρνησης εθνικής ενότητας (1‐12‐1944) για τον αφοπλισμό όλων των αντάρτικων ομάδων, με αποτέλεσμα το θάνατο 33 διαδηλωτών και τον τραυματισμό άλλων 148. Παράλληλα ο στρατηγός Σκόμπυ προέβη σε διάγγελμα, ενώ άμεσες προσπάθειες για πολιτική λύση απαγορεύτηκαν από τον Τσώρτσιλ. In groff 1.23.0, you will even be able to use nroff: $ ./build/nroff -k -ms ATTIC/sample-greek.ms | cat -s 1. Δεκεμβριανά Η έναρξή τους, στις 3 Δεκεμβρίου του 1944, σηματοδοτείται από τους πυροβολισμούς των Αστυνομικών δυνάμεων μπροστά στο μνημείο του άγνωστου στρατιώτη ενάντια στη διαδήλωση του ΕΑΜ, που είχε οργανωθεί ως απάντηση στο τελεσίγραφο της κυβέρνησης εθνικής ενότητας (1‐12‐1944) για τον αφοπλισμό όλων των αντάρτικων ομάδων, με αποτέλεσμα το θάνατο 33 διαδηλωτών και τον τραυματισμό άλλων 148. Παράλληλα ο στρατηγός Σκόμπυ προέβη σε διάγγελμα, ενώ άμεσες προσπάθειες για πολιτική λύση απαγορεύτηκαν από τον Τσώρτσιλ. But when preparing DVI, PostScript, or PDF, we have a problem. $ groff -k -ms -Tpdf ATTIC/sample-greek.ms >| ATTIC/sample-greek.pdf troff: ATTIC/sample-greek.ms:2: warning: can't find special character 'u03B1_0301' troff: ATTIC/sample-greek.ms:4: warning: can't find special character 'u03B5_0301' troff: ATTIC/sample-greek.ms:4: warning: can't find special character 'u03B7_0301' troff: ATTIC/sample-greek.ms:4: warning: can't find special character 'u03B9_0301' troff: ATTIC/sample-greek.ms:4: warning: can't find special character 'u03BF_0301' troff: ATTIC/sample-greek.ms:5: warning: can't find special character 'u03C5_0301' troff: ATTIC/sample-greek.ms:5: warning: can't find special character 'u03C9_0301' What is happening is that letters with the acute accent (Greek: tonos) are getting dropped. preconv(1) produces them in precomposed form (Unicode Normalization Form C), which is fine for terminals, but not necessarily the right thing to do on typesetters. GNU troff therefore decomposes them. But it appears that some logic is missing for recombining them. What I'm not sure about is what component of the system has the missing functionality. In the old days (the 1970s and 1980s, as seen in the accent mark support of ms(7) and me(7)), you'd just barrel forward, formatting the base character and combining character together with the \o escape sequence. This approach breaks down when you need to apply multiple combining characters, as happens perhaps most famously with Vietnamese, but also with seemingly simpler scripts like the Pinyin romanization of Mandarin. https://savannah.gnu.org/bugs/index.php?57524 My understanding is that modern font formats like OpenType (and TrueType?) are supposed to be smart, and are able to handle this situation with aplomb, though there are surely limits to this and I have no idea how the surpass of those limits is supposed to be communicated back to typesetting software. But as far as I know the PostScript Type 1 fonts that we work with _aren't_ smart, which leaves the problem in groff's hands. As an experiment I tried the following crude workaround, called "sample-greek2.groff". .if t \{\ . char \[u03B1_0301] \o'\[u03B1]\[u00B4]' . char \[u03B5_0301] \o'\[u03B5]\[u00B4]' . char \[u03B7_0301] \o'\[u03B7]\[u00B4]' . char \[u03B9_0301] \o'\[u03B9]\[u00B4]' . char \[u03BF_0301] \o'\[u03BF]\[u00B4]' . char \[u03C5_0301] \o'\[u03C5]\[u00B4]' . char \[u03C9_0301] \o'\[u03C9]\[u00B4]' .\} \[u03B1_0301] \[u03B5_0301] \[u03B7_0301] \[u03B9_0301] \[u03BF_0301] \[u03C5_0301] \[u03C9_0301] .pl \n[nl]u This works...kind of. It beats dropping characters entirely, but to my eyesight the acute accents aren't truly centered over the base glyph. This might have to do with the glyphs being italic instead of upright; the latter is surely implied by the code points being in the Unicode Greek and Coptic block (U+0370-03FF). Maybe this is a problem with the Ghostscript 9.53.3 fonts. But this: $ groff -Tpdf -P -y -P U ATTIC/sample-greek2.groff >| \ ATTIC/sample-greek2.pdf ...has the same problems. Worse, in fact, since the acute accent in this version of URW Times roman is grazing the tops of the lowercase Greek letters. Do these fonts just suck? Does someone have a good Type 1 Greek font to recommend? With that in hand it may be easier to decide what groff can do better (apart from native TTF and OTF support). Regards, Branden
signature.asc
Description: PGP signature