svn: cl.h svn.c util.c

Branko Čibej Wed, 28 May 2025 04:12:27 -0700

On 26. 5. 25 19:24, rin...@apache.org wrote:

Author: rinrab
Date: Mon May 26 17:24:16 2025
New Revision: 1925834


URL:http://svn.apache.org/viewvc?rev=1925834&view=rev
Log:
On the 'utf8-cmdline-prototype' branch: Store utf8-ized message in opt_state.

The reason this is stored unchanged is so that there's no conversionback from UTF-8 for the -F check.

* subversion/svn/cl.h
   (svn_cl__opt_state_t::message): Adjust comment.
* subversion/svn/svn.c
   (sub_main::--message): Use utf8_opt_arg instead of opt_arg.
   (sub_main::message-path-check): Use svn_io_stat() instead of apr_stat() to
    check path, because it accepts utf8 path, unlike apr.

Right here you now have a native -> UTF-8 -> native double conversion.These can be problematic, especially when they interact with thefilesystem, not just the OS. Yes, a filesystem can use a differentencoding than other parts of the OS. That used to be the case on macOSbefore they switched from HFS+ to APFS. HFS+ used UTF-8 transcribed fromUnicode normalisation form D, whereas the runtime internally used formC, with sometimes hilarious effects. We never did fix that problem.

On Unix, ZFS can be configured to use several different name encodingsand Unicode normalisations, so this issue is hardly specific to macOS.On Windows ... well, just remember that your network shared folder canlive on a Unix machine running Samba and using native Shift-JISencoding, for example. Only NTFS and its derivatives are guaranteed touse UTF-16-LE and normalisation form C (and even then, there are allthose plane-1 codes like 👀 <- that one for example, that arerepresented by two UTF-16 code points).

I'd be careful when changing how and when we convert stuff to UTF-8.It's not always obviously wrong, as you seem to assume. :)


-- Brane

Re: svn commit: r1925834 - in /subversion/branches/utf8-cmdline-prototype/subversion/svn: cl.h svn.c util.c

Reply via email to