Re: [Groff] ubuntu, groff and utf-8

Michail Vidiassov Tue, 08 Mar 2005 04:42:41 -0800

Dear Werner,
you wrote:

This is correct, unfortunately.  groff doesn't yet support UTF8 input.
You have to convert your file first to something groff can understand.

Below is a small perl script which does that.  Note that it doesn't
`fake' glyphs, this is, it doesn't construct, say, `Amacron' from an
`A' and a `macron' glyph.  Any volunteer for this?

======================================================================

#! /usr/bin/perl -w
#
# uni2groff.pl
#
# Convert input in UTF8 encoding to something groff 1.19 or greater
# can understand.  It simply converts all Unicode values >= U+0080
# to the form \[uXXXX].
#
# Usage:
#
#   perl uni2groff.pl < infile > outfile
#
# You need perl 5.6 or greater.

use strict;

binmode(STDIN, ":utf8");

while (<>) {
 s/(\P{InBasicLatin})/sprintf("\\[u%04X]", ord($1))/eg;
 print;
}

# EOF

It seems there is a problem with this script. If there is an `Amacron' in the data, the script produces `u0100'. But glyphs in groff are named in decomposed form, glyph name for `Amacron' is `u0041_0304'. You can see this from unicode_decomposed hash in afmtodit and uniglyph.cpp & glyphuni.cpp . Thus your script has to be made a bit longer by inclusion of unicode_decomposed hash ;)

And, may be, it is a good idea to replace (optionally, where possible) unicode glyph names with the (approx. two character) groff glyph names, the way it is done in input.cpp, using unicode_to_glyph_list and following precedents from latin?.tmac. The reason is to make the output more portable and human-readable.

Sincerely, Michail PS. Are you sure that mapping in devutf8 fonts (and other places) `la' and `ra' to 0x27E8(MATHEMATICAL LEFT ANGLE BRACKET) and 0x27E9 is a good idea? It do not think many fonts have that Math Symbols, while `la' and `ra' are often used in roff files in non-math context


_______________________________________________
Groff mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/groff

Re: [Groff] ubuntu, groff and utf-8

Reply via email to