On Mon, May 26, 2025 at 10:07 PM Branko Čibej <br...@apache.org> wrote:
> On 26. 5. 25 21:25, Timofei Zhakov wrote: > > On Mon, May 26, 2025 at 8:29 PM Branko Čibej <br...@apache.org> wrote: > >> On 26. 5. 25 14:07, Timofei Zhakov wrote: >> >> On Tue, May 20, 2025 at 9:29 PM Branko Čibej <br...@apache.org> wrote: >> >>> On 20. 5. 25 20:31, rin...@apache.org wrote: >>> >>> Author: rinrab >>> Date: Tue May 20 18:31:35 2025 >>> New Revision: 1925722 >>> >>> URL: http://svn.apache.org/viewvc?rev=1925722&view=rev >>> Log: >>> On the 'utf8-cmdline-prototype' branch: add a new function that retrieves >>> target list from args, instead of converting the encoding, assumes that >>> args have already been converted. >>> >>> >>> >>> Once again, why don't you use apr_app_initialize() that does all that >>> and more, instead of perpetuating the same problem? As long as this is used >>> on Windows from main() instead of wmain(), there's no way it will work >>> consistently. >>> >> >> First I wanna mention that this exact change is still needed even if we >> decide to use apr app to host the binary. At this point there is no >> difference between using apr app and svn's implementation of wmain and main >> wrappers. >> >> Also, I'd like to argue about using apr app in general. The subversion's >> command line wasn't using this library during the whole history (as far as >> I know), so I don't think it's correct to start using it now. What would we >> do if we need some functionality apr app doesn't implement? I mean that >> would be a quite huge change... >> >> In my opinion, apr app is not something fully ready to use in subversion. >> For example, it doesn't wrap main on unix platforms, while it still can be >> executed with non-utf8-ized arguments. In wmain() apr_conv_ucs2_to_utf8() >> just ignores the error, which is bad (on win32). >> >> And finally, the code we have right now is pretty good, and it seems >> possible to support non-ASCII arguments (which I'm working on). >> >> I hope this makes some sense. >> >> >> >> OK, fine, but I still don't see where you're converting the argument list >> directly from UTF-16-LE to UTF-8. svn_cmdline__win32_get_cstring_argv >> doesn't do that, and it should. >> >> That's the main problem on Windows: that function is converting to the >> local multibyte character set, then svn_client_args_to_target_array2 >> converts /that/ to UTF-8. That's fine-ish on Unix, where we don't convert >> twice, but completely brain-dead on Windows. And I freely admit I had a lot >> do do with that brain-deadedness. >> >> This is what should be happening, IMO: >> >> 1. deprecate svn_client_args_to_target_array2 >> 2. rename your svn_client_args_to_target_array_utf8 >> svn_client_args_to_target_array3 and make it explicitly require that the >> args are already in UTF-8 >> 3. svn_cmdline__win32_get_cstring_argv must convert to UTF-8 >> 4. svn_cmdline__default_get_cstring_argv must *also* convert to UTF-8 >> >> >> You've done steps 1 and 2, 3 should be trivial and 4 will make the result >> compatible with 3. >> >> -- Brane >> > > IMHO, it's much better to follow my way instead, in which I'm adding utf8 > alternatives for those functions, without deprecating old ones, let's call > them cstring. > > > > The old ones are wrong, period. Subversion's internal character set has > been UTF-8 practically from day 1. svn_client_args_to_target_array2 is > violating that convention. We have conversion functions. > > > Oh, exactly! You are right and this finally makes sense. I didn't initially think about that. Let me quickly fix that... > > This is exactly what happened with svn_cmdline__get_cstring_argv(). I > added a new function svn_cmdline__get_utf8_argv(), that returns a utf8 > string instead. Then we can switch the one used in each executable > one-by-one. > > > Those are private functions, you can do whatever you want with them, as > long as you end up with only one version of each instead of, I dunno, five. > > Those are also transitional functions needed to migrate all usages smoothly. Probably it makes sense to drop the old ones then, but it would not be a big deal for us in future... > > The same thing with svn_client_args_to_target_array_utf8, and also this > difference in names is also helpful to differentiate between cstring and > utf8 versions much more than version bump. Also the old one has not been > deprecated, because someone may still want this function to convert > encoding. > > > Except for the encoding conversion functions, *none* of our API has utf8 > in the name, because that's understood and implied. > svn_client_args_to_target_array2 is one of the few exceptions that converts > between encoding without saying so explicitly in the name, and it's been a > pain to deal with precisely because of that. > > Our API requires that strings are in UTF-8. The fewer exceptions we have > to that, the better. > > Now, I see. > > Let's not overcomplicate things if possible. > > > Yes, exactly. Let's not maintain multiple incompatible versions of the > same function. I completely agree. > Yay! > > -- Brane > -- Timofei Zhakov