On 26. 5. 25 19:24, rin...@apache.org wrote:
Author: rinrab
Date: Mon May 26 17:24:16 2025
New Revision: 1925834

URL:http://svn.apache.org/viewvc?rev=1925834&view=rev
Log:
On the 'utf8-cmdline-prototype' branch: Store utf8-ized message in opt_state.


The reason this is stored unchanged is so that there's no conversion back from UTF-8 for the -F check.


* subversion/svn/cl.h
   (svn_cl__opt_state_t::message): Adjust comment.
* subversion/svn/svn.c
   (sub_main::--message): Use utf8_opt_arg instead of opt_arg.
   (sub_main::message-path-check): Use svn_io_stat() instead of apr_stat() to
    check path, because it accepts utf8 path, unlike apr.

Right here you now have a native -> UTF-8 -> native double conversion. These can be problematic, especially when they interact with the filesystem, not just the OS. Yes, a filesystem can use a different encoding than other parts of the OS. That used to be the case on macOS before they switched from HFS+ to APFS. HFS+ used UTF-8 transcribed from Unicode normalisation form D, whereas the runtime internally used form C, with sometimes hilarious effects. We never did fix that problem.

On Unix, ZFS can be configured to use several different name encodings and Unicode normalisations, so this issue is hardly specific to macOS. On Windows ... well, just remember that your network shared folder can live on a Unix machine running Samba and using native Shift-JIS encoding, for example. Only NTFS and its derivatives are guaranteed to use UTF-16-LE and normalisation form C (and even then, there are all those plane-1 codes like 👀 <- that one for example, that are represented by two UTF-16 code points).

I'd be careful when changing how and when we convert stuff to UTF-8. It's not always obviously wrong, as you seem to assume. :)

-- Brane

Reply via email to