On 26. 5. 25 19:24, rin...@apache.org wrote:
Author: rinrab
Date: Mon May 26 17:24:16 2025
New Revision: 1925834
URL:http://svn.apache.org/viewvc?rev=1925834&view=rev
Log:
On the 'utf8-cmdline-prototype' branch: Store utf8-ized message in opt_state.
The reason this is stored unchanged is so that there's no conversion
back from UTF-8 for the -F check.
* subversion/svn/cl.h
(svn_cl__opt_state_t::message): Adjust comment.
* subversion/svn/svn.c
(sub_main::--message): Use utf8_opt_arg instead of opt_arg.
(sub_main::message-path-check): Use svn_io_stat() instead of apr_stat() to
check path, because it accepts utf8 path, unlike apr.
Right here you now have a native -> UTF-8 -> native double conversion.
These can be problematic, especially when they interact with the
filesystem, not just the OS. Yes, a filesystem can use a different
encoding than other parts of the OS. That used to be the case on macOS
before they switched from HFS+ to APFS. HFS+ used UTF-8 transcribed from
Unicode normalisation form D, whereas the runtime internally used form
C, with sometimes hilarious effects. We never did fix that problem.
On Unix, ZFS can be configured to use several different name encodings
and Unicode normalisations, so this issue is hardly specific to macOS.
On Windows ... well, just remember that your network shared folder can
live on a Unix machine running Samba and using native Shift-JIS
encoding, for example. Only NTFS and its derivatives are guaranteed to
use UTF-16-LE and normalisation form C (and even then, there are all
those plane-1 codes like 👀 <- that one for example, that are
represented by two UTF-16 code points).
I'd be careful when changing how and when we convert stuff to UTF-8.
It's not always obviously wrong, as you seem to assume. :)
-- Brane