On 8/7/20 8:06 AM, sunnycemet...@gmail.com wrote:
... possibly. Please see for yourself:
■ LC_ALL=C ls -l
total 1
-rw-r--r-- 1 userx userx 0 Aug 7 08:35
''$'\325\253\302\265\366''+'$'\325\361\275\322\374\253\322\342\203\322\351''+'$'\322\351\245\322\342\304\264''+'$'\364''rd'$'\264''+'$'\342''07.srt'
■ echo $LANG
ja_JP.utf8
■ find -name '*.srt'
■ LC_ALL=C find -name '*.srt'
./?????+???????????+???????+?rd?+?07.srt
I have attached logs of the following debug command for either locale,
with ‘ and ’ replaced with ' for quick diff comparison. Debug output
does not elucidate much, but perhaps someone can shed light on how such
a seemingly simple search could possibly fail (or even be affected by
locale in the first place).
find -D all -name '*.srt'
'find' is not part of coreutils. That said, you are correct that
globbing is locale-sensitive. You have a filename that uses invalid
encodings in some locales but not others. But POSIX says that the '*'
glob only has to match characters, not encoding errors. So your choice
of locale (and thus which byte sequences are valid characters) indeed
affects the results of the glob, and therefore what find is able to output.
I would argue that this is not a bug, but you may get other opinions if
you ask on bug-findutils.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org