[bug #65108] [troff] support construction of general file name request arguments

Dave Tue, 03 Sep 2024 20:49:30 -0700

Follow-up Comment #11, bug #65108 (group groff):

[comment #0 original submission:]
> we have no way of knowing what the file system's character encoding is.
> Might be ISO 8859-1, UTF-8, UTF-16BE/LE, or something else entirely.


I'm not sure now if that's a meaningful question.  The file system seems to
just store a string of bytes as the file name, and leave it up to the shell
how to interpret that.

$ mkdir foo
$ cd foo
$ echo résumé | iconv -tutf8 | xargs touch
$ echo résumé | iconv -tlatin1 | xargs touch
$ echo * | od -c
0000000   r 303 251   s   u   m 303 251       r 351   s   u   m 351  \n
0000020

Then a UTF-8 shell produces:

$ ls
 résumé  'r'$'\351''sum'$'\351'

and a Latin-1 shell produces:

$ ls
rÃ©sumÃ©  résumé

That is, both filenames are valid (but different) strings of Latin-1
characters.  In UTF-8, one of them is a string of valid characters, and one
has two invalid bytes in it.

This is an ext4 file system, but I would imagine any other Unix-based one
would have to work the same in order to interact with shells consistently.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?65108>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

signature.asc
Description: PGP signature

[bug #65108] [troff] support construction of general file name request arguments

Reply via email to