Bruno Haible wrote:
Alexander E. Patrakov wrote:
The answer "patch
glibc so that iconv transliterates the bullet to 'o'" is better (and in
fact this is doable), but I think that users of non-Glibc systems (or
old Glibc) will complain if this becomes the official answer.
Why should they complain? They can use GNU libiconv. It transliterates the
bullet to 'o', like you wish.
The "iconv" program from libiconv transliterates the bullet to ".",
which is also acceptable. Also, it transliterates quotes nicely. Thanks!
As for the "iconv" program from glibc, the situation is worse. I have
prepared a patch against Glibc-2.3.6 (attached) that transliterates the
offending characters produced by Groff into their ASCII equivalents if
there is no any other suitable fallback. You can try it without
rebuilding glibc by applying it to the installed copy of the
"translit_neutral" file (in /usr/share/i18n/locales) and rebuilding all
locales with localedef. The patch works in all locales except "C" (see
below), but libiconv provides nicer quotes. Is this patch a right solution?
As for the "C" locale, the problem is that "iconv" from Glibc uses
transliteration data from the current locale (e.g., in order to
substitute รค with ae in German locales), and such locale-specific
transliteration table is missing for the "C" locale (which IMHO is a
Glibc bug). In contrast to that, libiconv bases its decisions only upon
the source and destination character sets.
So, if you agree with all of the above, please help formulating a
well-stated bug report against Glibc. Draft (very bad) is below.
Bug1.
Subject: Allow transliteration in the "C" locale.
Component: libc
Description:
The iconv function from libiconv performs some useful transliterations
(e.g., replacing fancy quotes with their ASCII equivalents) in any
locale. iconv from Glibc doesn't do that and relies solely upon the
transliteration data from the current locale. Thus, there are no
transliterations in the "C" locale, although they would be useful.
The iconv function from glibc should probably, instead, rely upon the
union of locale-agnostic transliteration rules (like those from
libiconv) and locale-specific overrides.
Bug2.
Subject: Transliterate quotes and bullets in all locales.
Component: localedata
Description:
The iconv function from libiconv performs some useful transliterations
(e.g., replacing the quotes with their ASCII equivalents and the middle
dot with ASCII dot) in all locales. Iconv implementation from Glibc
doesn't always do this. Such deficiency is going to hurt future Groff
users, as described in [link to this thread]. Attached is a patch that
implements the needed transliteration rules. See also [Bug 1] for the
related issue with the "C" locale.
--
Alexander E. Patrakov
Submitted By: Alexander E. Patrakov
Date: 2006-01-26
Initial Package Version: 2.3.6
Upstream Status: Discussing
Origin: Alexander E. Patrakov
Description: Transliterates some characters (e.g., ones created by groff -Tutf8)
into their ASCII approximations.
--- glibc-2.3.6/localedata/locales/translit_neutral 2006-01-26 13:52:16.000000000 +0500
+++ glibc-2.3.6/localedata/locales/translit_neutral 2006-01-26 11:15:17.000000000 +0500
@@ -26,6 +26,10 @@
<U00AD> <U002D>
% REGISTERED SIGN
<U00AE> "<U0028><U0052><U0029>"
+% ACUTE ACCENT
+<U00B4> <U0027>
+% MIDDLE DOT
+<U00B7> <U002E>
% CEDILLA
<U00B8> <U002C>
% RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
@@ -39,9 +43,9 @@
% LATIN SMALL LETTER AE
<U00E6> "<U0061><U0065>"
% MODIFIER LETTER PRIME
-<U02B9> <U2032>;<U00B4>
+<U02B9> <U2032>;<U00B4>;<U0027>
% MODIFIER LETTER DOUBLE PRIME
-<U02BA> <U2033>;"<U00B4><U00B4>"
+<U02BA> <U2033>;"<U00B4><U00B4>";"<U0027><U0027>"
% MODIFIER LETTER TURNED COMMA
<U02BB> <U2018>
% MODIFIER LETTER APOSTROPHE
@@ -55,7 +59,7 @@
% MODIFIER LETTER MACRON
<U02C9> <U00AF>
% MODIFIER LETTER ACUTE ACCENT
-<U02CA> <U00B4>
+<U02CA> <U00B4>;<U0027>
% MODIFIER LETTER GRAVE ACCENT
<U02CB> <U0060>
% MODIFIER LETTER LOW MACRON
@@ -101,11 +105,11 @@
% NARROW NO-BREAK SPACE
<U202F> <U00A0>;<U0020>
% PRIME
-<U2032> <U00B4>
+<U2032> <U00B4>;<U0027>
% DOUBLE PRIME
-<U2033> "<U2032><U2032>";"<U00B4><U00B4>"
+<U2033> "<U2032><U2032>";"<U00B4><U00B4>";"<U0027><U0027>"
% TRIPLE PRIME
-<U2034> "<U2032><U2032><U2032>";"<U00B4><U00B4><U00B4>"
+<U2034> "<U2032><U2032><U2032>";"<U00B4><U00B4><U00B4>";"<U0027><U0027><U0027>"
% REVERSED PRIME
<U2035> <U0060>
% REVERSED DOUBLE PRIME
@@ -155,7 +159,7 @@
% ASTERISK OPERATOR
<U2217> <U002A>
% BULLET OPERATOR
-<U2219> <U2022>;<U00B7>
+<U2219> <U2022>;<U00B7>;<U002E>
% DIVIDES
<U2223> <U007C>
% RATIO
@@ -171,13 +175,13 @@
% MUCH GREATER-THAN
<U226B> "<U003E><U003E>"
% DOT OPERATOR
-<U22C5> <U00B7>
+<U22C5> <U00B7>;<U002E>
% VERY MUCH LESS-THAN
<U22D8> "<U003C><U003C><U003C>"
% VERY MUCH GREATER-THAN
<U22D9> "<U003E><U003E><U003E>"
% MIDLINE HORIZONTAL ELLIPSIS
-<U22EF> "<U00B7><U00B7><U00B7>"
+<U22EF> "<U00B7><U00B7><U00B7>";"<U002E><U002E><U002E>"
% SYMBOL FOR NULL
<U2400> "<U004E><U0055><U004C>"
% SYMBOL FOR START OF HEADING
_______________________________________________
Groff mailing list
Groff@gnu.org
http://lists.gnu.org/mailman/listinfo/groff