[bug #65654] preconv.cpp: Issue a warning if code '0xA0' is used in the input and thus changed to '\~'

Dave Sun, 28 Apr 2024 18:28:00 -0700

Follow-up Comment #2, bug #65654 (group groff):

Bjarni is talking about input, not output.  But the example is still slightly
confusing, because 0xA0 appears to refer to the Latin-1 character NO-BREAK
SPACE (Unicode U+00A0)--but there is no reason to run preconv if the file is
in Latin-1 encoding, as groff can read this directly.


Nonetheless, his point remains: many common Unix tools display the characters
U+0020 and U+00A0 indistinguishably.

But there is no reason for preconv to warn about this.  The same issue exists
no matter what Unix tool processes input containing both characters.  Users
may choose to avoid U+00A0 in their input files for this reason, or they may
use other strategies to deal with it.  It is not preconv's job to police this
usage.  Users who desire such warnings can write a simple preprocessor (using
grep or sed, perhaps) to emit them.

Once you start down the rabbit hole of "warn the user about characters that
are hard to visually tell apart," where do you stop?  In the monospace fonts
used in most terminals, you'd be hard-pressed to distinguish U+2012 FIGURE
DASH from U+2013 EN DASH.  Unicode has a plethora of space-like and dash-like
characters.  Should preconv warn about all of these?  That seems absurd.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?65654>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

[bug #65654] preconv.cpp: Issue a warning if code '0xA0' is used in the input and thus changed to '\~'

Reply via email to