Cedric Blancher via Cygwin wrote:
On Sat, 23 Nov 2024 at 11:44, Cedric Blancher <cedric.blanc...@gmail.com> wrote:
Good morning!
/bin/ls -l cannot handle printable Unicode characters outside the BMP

Example using '𝒯'
bash -c 'printf "\U0001D4AF\n"' # MATHEMATICAL SCRIPT CAPITAL T
(yes, our mathematicians want to use THAT as file name)

On Linux:
LC_ALL=en_US.UTF-8 bash -c 't="$(printf "\U0001D4AF\n")" ; touch "$t" "$t$t"'
ls -la
total 8
-rw-r--r--  1 ced staden  0 Nov 23 11:29 ΓΆΓΆΓΆΓΆΓΆΓΆΓΆ
-rw-r--r--  2 ced staden  4 Nov 23 11:31 𝒯
-rw-r--r--  2 ced staden  4 Nov 23 11:31𝒯𝒯

On Cygwin:
LC_ALL=en_US.UTF-8 bash -c 't="$(printf "\U0001D4AF\n")" ; touch "$t" "$t$t"'
$ ls -la
-rw-r--r-- 1 ced staden  0 Nov 23 11:29  ΓΆΓΆΓΆΓΆΓΆΓΆΓΆ
-rw-r--r-- 2 ced staden  4 Nov 23 11:31 ''$'\360\235\222\257'
-rw-r--r-- 2 ced staden  4 Nov 23 11:31 ''$'\360\235\222\257\360\235\222\257'

Looks like the Cygwin locale has a problem with non-BMP chars.
find(1) is even worse:
$ find .
.
./ΓΆΓΆΓΆΓΆΓΆΓΆΓΆ
./????
./x??x

The Microsoft Explorer GUI shows the file names correctly, so IMO this
is not a Windows or Win32 API problem.
Slightly different filename problem which may be related or not:
https://sourceware.org/pipermail/cygwin/2024-September/256451.html

--
Regards,
Christian


--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to