Cedric Blancher via Cygwin wrote:
On Sat, 23 Nov 2024 at 11:44, Cedric Blancher <cedric.blanc...@gmail.com> wrote:
Good morning!
/bin/ls -l cannot handle printable Unicode characters outside the BMP
Example using 'π―'
bash -c 'printf "\U0001D4AF\n"' # MATHEMATICAL SCRIPT CAPITAL T
(yes, our mathematicians want to use THAT as file name)
On Linux:
LC_ALL=en_US.UTF-8 bash -c 't="$(printf "\U0001D4AF\n")" ; touch "$t" "$t$t"'
ls -la
total 8
-rw-r--r-- 1 ced staden 0 Nov 23 11:29 ΓΆΓΆΓΆΓΆΓΆΓΆΓΆ
-rw-r--r-- 2 ced staden 4 Nov 23 11:31 π―
-rw-r--r-- 2 ced staden 4 Nov 23 11:31π―π―
On Cygwin:
LC_ALL=en_US.UTF-8 bash -c 't="$(printf "\U0001D4AF\n")" ; touch "$t" "$t$t"'
$ ls -la
-rw-r--r-- 1 ced staden 0 Nov 23 11:29 ΓΆΓΆΓΆΓΆΓΆΓΆΓΆ
-rw-r--r-- 2 ced staden 4 Nov 23 11:31 ''$'\360\235\222\257'
-rw-r--r-- 2 ced staden 4 Nov 23 11:31 ''$'\360\235\222\257\360\235\222\257'
Looks like the Cygwin locale has a problem with non-BMP chars.
find(1) is even worse:
$ find .
.
./ΓΆΓΆΓΆΓΆΓΆΓΆΓΆ
./????
./x??x
The Microsoft Explorer GUI shows the file names correctly, so IMO this
is not a Windows or Win32 API problem.
Slightly different filename problem which may be related or not:
https://sourceware.org/pipermail/cygwin/2024-September/256451.html
--
Regards,
Christian
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple