I note that much of the documentation still uses a quoting style that
pretends that characters U+0060 and U+0027 are matching opening and closing
quotes, and that new documentation is still being added that follows this
style. For extra credit, they're sometimes redoubled as `` and '' to be
fake double quotes.

The use of the grave accent symbol as if it were a quote mark is visually
asymmetric (ugly!), has semantic conflicts (including with its use as a
shell metacharacter), is in the wrong character class (for line wrapping
and hyphenation), disregards all formal specifications (Unicode-16.0.0
(2024) still says "grave accent"), and is extremely outdated (ASA
X3.4-1963 said
"diacritic" 62 years ago). A more thorough analysis is provided by Markus
Kuhn <https://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html>.

GNU is the last serious hold-out, and "this is how we've always done it"
won't wash any more.

I propose, at minimum, that the U+0060 grave accent be replaced wherever
it's been misused as an opening quote, but a better result would be to
replace both, using paired Unicode ‘typographic’ quotes where possible.
Wherever redoubled `` '' pairs have been used, they should be replaced by
the corresponding double quote characters.

Whether to use Unicode ‘quote’ style, or just stick with ASCII 'quote'
style, depends on context:
* In HTML documentation, not using typographic quotes lacks any reasonable
defence: any program that can show HTML can also cope with Unicode. Any
editor whose keyboard doesn't have typographic quotes can type HTML
entities instead.
* Strings that are compiled into Bash have to be displayable on terminals
that lack unicode support. Either they need to be written in pure ASCII, or
the output function needs to replace typographic quotes with ASCII ones.
(Consider augmenting gettext() to do the latter as its fallback.)
* Man pages, info files, and other stuff that gets locale handling can use
en.UTF-8 as the primary version, and generate C/POSIX (ASCII-only) from
that.
* Translations should be encouraged to use their respective typographic
quoting style: „DE“, »DK«, «FR», ”HE„, „HU”, 『JP』 etc. (See
https://en.wikipedia.org/wiki/Quotation_mark#Summary_table)
* Files with monospaced plaintext (CHANGES, HISTORY, etc) - either 'ASCII'
quotes or ‘Unicode’ quotes depending on what Chet can type.
* LICENCE/LICENSE - ask the respective licence-holders to provide updated
versions, or to ratify our "translation" (especially GNU & BSD).
* m4 (aka “where did I put my seppuku blade?”) Add
changequote(,)changequote(`,')dnl to the start of all documents that
tacitly assume `', so that this assumption can eventually be deprecated.
* Other stuff - what have I missed?

What do others think?

-Martin

PS: arguably I should have started this in coreutils or gnu-policy, but I'm
starting here because ` is syntactically significant to Bash, so there's
extra damage.

PPS: I actually really like m4, with this solitary exception. I would
*love* to see it updated. “m6” anyone?

PPPS: If we're updating LICENCE documents, we should probably replace “(C)”
with “©” (U+00A9).

Reply via email to