Follow-up Comment #11, bug #65108 (group groff): [comment #0 original submission:] > we have no way of knowing what the file system's character encoding is. > Might be ISO 8859-1, UTF-8, UTF-16BE/LE, or something else entirely.
I'm not sure now if that's a meaningful question. The file system seems to just store a string of bytes as the file name, and leave it up to the shell how to interpret that. $ mkdir foo $ cd foo $ echo résumé | iconv -tutf8 | xargs touch $ echo résumé | iconv -tlatin1 | xargs touch $ echo * | od -c 0000000 r 303 251 s u m 303 251 r 351 s u m 351 \n 0000020 Then a UTF-8 shell produces: $ ls résumé 'r'$'\351''sum'$'\351' and a Latin-1 shell produces: $ ls résumé résumé That is, both filenames are valid (but different) strings of Latin-1 characters. In UTF-8, one of them is a string of valid characters, and one has two invalid bytes in it. This is an ext4 file system, but I would imagine any other Unix-based one would have to work the same in order to interact with shells consistently. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?65108> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/
signature.asc
Description: PGP signature