Bruno Haible wrote:
As for the "iconv" program from glibc, the situation is worse. I have
prepared a patch against Glibc-2.3.6 (attached) that transliterates the
offending characters produced by Groff into their ASCII equivalents if
there is no any other suitable fallback. You can try it without
rebuilding glibc by applying it to the installed copy of the
"translit_neutral" file (in /usr/share/i18n/locales) and rebuilding all
locales with localedef. The patch works in all locales except "C" (see
below) ... Is this patch a right solution?
The BULLET, PRIME and DOT/ELLIPSIS parts are probably acceptable.
The ACUTE ACCENT part looks wrong.
But libiconv also transliterates it to "'" :)
1. An acute accent is not a quoting character. Anyone using an acute
accent for quoting is abusing this character.
Agreed, Groff should be fixed. Also it probably should use Unicode
bullets (not middle dots) for bullets.
2. U+0027 is an apostrophe, a small vertical line, that doesn't change
when mirrored left<->right.
That's still better than a question mark or nothing.
When you submit a patch for "translit_neutral", you also need to make
the corresponding changes to locale/C-translit.h.in.
Corrected patch attached. Parts that you disagreed with are commented out.
I would split this into two different patches, simply to increase the
chances of having at least one of them accepted. - As I said above,
transliterating ACUTE ACCENT to APOSTROPHE is simply wrong.
I will split the ACUTE ACCENT part to a separate patch as soon as you
comment upon the behaviour of libiconv in this case.
The revised text of the bug report:
Subject: Transliterate quotes and bullets in all locales.
Component: localedata
Description:
The iconv function from libiconv performs some useful transliterations
(e.g., replacing the quote-like characters with their ASCII equivalents
and the middle dot with ASCII dot) in all locales. Iconv implementation
from Glibc doesn't always do this. Such deficiency is going to hurt
future Groff users, as described in [link to this thread]. Attached is a
patch that implements the needed transliteration rules. The ACUTE ACCENT
part has been commented out because Bruno Haible thinks it should not be
done in this way, but I disagree with him because APOSTROPHE is better
than a question mark as a replacement for ACUTE ACCENT. Implement this
as you wish, but note that Groff does abuse this ACUTE ACCENT, and
without the commented-out parts iconv does replace it with a question
mark in some locales.
--
Alexander E. Patrakov
Submitted By: Alexander E. Patrakov
Date: 2006-01-26
Initial Package Version: 2.3.6
Upstream Status: Discussing
Origin: Alexander E. Patrakov
Description: Transliterates some characters (e.g., ones created by groff -Tutf8)
into their ASCII approximations.
--- glibc-2.3.6/locale/C-translit.h.in 2002-04-20 13:16:46.000000000 +0600
+++ glibc-2.3.6/locale/C-translit.h.in 2006-01-26 19:50:35.000000000 +0500
@@ -25,7 +25,9 @@
"\x00ab" "<<" /* <U00AB> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK */
"\x00ad" "-" /* <U00AD> SOFT HYPHEN */
"\x00ae" "(R)" /* <U00AE> REGISTERED SIGN */
+/* "\x00b4" "'" */ /* <U00B4> ACUTE ACCENT */
"\x00b5" "u" /* <U00B5> MICRO SIGN */
+"\x00b7" "." /* <U00B7> MIDDLE DOT */
"\x00b8" "," /* <U00B8> CEDILLA */
"\x00bb" ">>" /* <U00BB> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK */
"\x00bc" " 1/4 " /* <U00BC> VULGAR FRACTION ONE QUARTER */
@@ -52,9 +54,12 @@
"\x01f1" "DZ" /* <U01F1> LATIN CAPITAL LETTER DZ */
"\x01f2" "Dz" /* <U01F2> LATIN CAPITAL LETTER D WITH SMALL LETTER Z */
"\x01f3" "dz" /* <U01F3> LATIN SMALL LETTER DZ */
+"\x02b9" "'" /* <U02B9> MODIFIER LETTER PRIME */
+"\x02ba" "''" /* <U02BA> MODIFIER LETTER DOUBLE PRIME */
"\x02bc" "'" /* <U02BC> MODIFIER LETTER APOSTROPHE */
"\x02c6" "^" /* <U02C6> MODIFIER LETTER CIRCUMFLEX ACCENT */
"\x02c8" "'" /* <U02C8> MODIFIER LETTER VERTICAL LINE */
+/* "\x02ca" "`" */ /* <U02CA> MODIFIER LETTER ACUTE ACCENT */
"\x02cb" "`" /* <U02CB> MODIFIER LETTER GRAVE ACCENT */
"\x02cd" "_" /* <U02CD> MODIFIER LETTER LOW MACRON */
"\x02d0" ":" /* <U02D0> MODIFIER LETTER TRIANGULAR COLON */
@@ -88,6 +93,9 @@
"\x2025" ".." /* <U2025> TWO DOT LEADER */
"\x2026" "..." /* <U2026> HORIZONTAL ELLIPSIS */
"\x202f" " " /* <U202F> NARROW NO-BREAK SPACE */
+"\x2032" "'" /* <U2032> PRIME */
+"\x2033" "''" /* <U2033> DOUBLE PRIME */
+"\x2034" "'''" /* <U2034> TRIPLE PRIME */
"\x2035" "`" /* <U2035> REVERSED PRIME */
"\x2036" "``" /* <U2036> REVERSED DOUBLE PRIME */
"\x2037" "```" /* <U2037> REVERSED TRIPLE PRIME */
@@ -199,6 +207,7 @@
"\x2215" "/" /* <U2215> DIVISION SLASH */
"\x2216" "\\" /* <U2216> SET MINUS */
"\x2217" "*" /* <U2217> ASTERISK OPERATOR */
+"\x2219" "o" /* <U2219> BULLET OPERATOR */
"\x2223" "|" /* <U2223> DIVIDES */
"\x2236" ":" /* <U2236> RATIO */
"\x223c" "~" /* <U223C> TILDE OPERATOR */
@@ -206,8 +215,10 @@
"\x2265" ">=" /* <U2265> GREATER-THAN OR EQUAL TO */
"\x226a" "<<" /* <U226A> MUCH LESS-THAN */
"\x226b" ">>" /* <U226B> MUCH GREATER-THAN */
+"\x22c5" "." /* <U22C5> DOT OPERATOR */
"\x22d8" "<<<" /* <U22D8> VERY MUCH LESS-THAN */
"\x22d9" ">>>" /* <U22D9> VERY MUCH GREATER-THAN */
+"\x22ef" "..." /* <U22EF> MIDLINE HORIZONTAL ELLIPSIS */
"\x2400" "NUL" /* <U2400> SYMBOL FOR NULL */
"\x2401" "SOH" /* <U2401> SYMBOL FOR START OF HEADING */
"\x2402" "STX" /* <U2402> SYMBOL FOR START OF TEXT */
--- glibc-2.3.6/localedata/locales/translit_neutral 2002-04-20 13:14:27.000000000 +0600
+++ glibc-2.3.6/localedata/locales/translit_neutral 2006-01-26 19:39:01.000000000 +0500
@@ -26,6 +26,10 @@
<U00AD> <U002D>
% REGISTERED SIGN
<U00AE> "<U0028><U0052><U0029>"
+% ACUTE ACCENT
+% <U00B4> <U0027>
+% MIDDLE DOT
+<U00B7> <U002E>
% CEDILLA
<U00B8> <U002C>
% RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
@@ -39,9 +43,9 @@
% LATIN SMALL LETTER AE
<U00E6> "<U0061><U0065>"
% MODIFIER LETTER PRIME
-<U02B9> <U2032>;<U00B4>
+<U02B9> <U2032>;<U00B4>;<U0027>
% MODIFIER LETTER DOUBLE PRIME
-<U02BA> <U2033>;"<U00B4><U00B4>"
+<U02BA> <U2033>;"<U00B4><U00B4>";"<U0027><U0027>"
% MODIFIER LETTER TURNED COMMA
<U02BB> <U2018>
% MODIFIER LETTER APOSTROPHE
@@ -56,6 +60,7 @@
<U02C9> <U00AF>
% MODIFIER LETTER ACUTE ACCENT
<U02CA> <U00B4>
+% <U02CA> <U00B4>;<U0027>
% MODIFIER LETTER GRAVE ACCENT
<U02CB> <U0060>
% MODIFIER LETTER LOW MACRON
@@ -101,11 +106,11 @@
% NARROW NO-BREAK SPACE
<U202F> <U00A0>;<U0020>
% PRIME
-<U2032> <U00B4>
+<U2032> <U00B4>;<U0027>
% DOUBLE PRIME
-<U2033> "<U2032><U2032>";"<U00B4><U00B4>"
+<U2033> "<U2032><U2032>";"<U00B4><U00B4>";"<U0027><U0027>"
% TRIPLE PRIME
-<U2034> "<U2032><U2032><U2032>";"<U00B4><U00B4><U00B4>"
+<U2034> "<U2032><U2032><U2032>";"<U00B4><U00B4><U00B4>";"<U0027><U0027><U0027>"
% REVERSED PRIME
<U2035> <U0060>
% REVERSED DOUBLE PRIME
@@ -155,7 +160,7 @@
% ASTERISK OPERATOR
<U2217> <U002A>
% BULLET OPERATOR
-<U2219> <U2022>;<U00B7>
+<U2219> <U2022>;<U00B7>;<U006F>
% DIVIDES
<U2223> <U007C>
% RATIO
@@ -171,13 +176,13 @@
% MUCH GREATER-THAN
<U226B> "<U003E><U003E>"
% DOT OPERATOR
-<U22C5> <U00B7>
+<U22C5> <U00B7>;<U002E>
% VERY MUCH LESS-THAN
<U22D8> "<U003C><U003C><U003C>"
% VERY MUCH GREATER-THAN
<U22D9> "<U003E><U003E><U003E>"
% MIDLINE HORIZONTAL ELLIPSIS
-<U22EF> "<U00B7><U00B7><U00B7>"
+<U22EF> "<U00B7><U00B7><U00B7>";"<U002E><U002E><U002E>"
% SYMBOL FOR NULL
<U2400> "<U004E><U0055><U004C>"
% SYMBOL FOR START OF HEADING
_______________________________________________
Groff mailing list
Groff@gnu.org
http://lists.gnu.org/mailman/listinfo/groff