On Mon, May 26, 2025 at 10:07 PM Branko Čibej <br...@apache.org> wrote:

> On 26. 5. 25 21:25, Timofei Zhakov wrote:
>
> On Mon, May 26, 2025 at 8:29 PM Branko Čibej <br...@apache.org> wrote:
>
>> On 26. 5. 25 14:07, Timofei Zhakov wrote:
>>
>> On Tue, May 20, 2025 at 9:29 PM Branko Čibej <br...@apache.org> wrote:
>>
>>> On 20. 5. 25 20:31, rin...@apache.org wrote:
>>>
>>> Author: rinrab
>>> Date: Tue May 20 18:31:35 2025
>>> New Revision: 1925722
>>>
>>> URL: http://svn.apache.org/viewvc?rev=1925722&view=rev
>>> Log:
>>> On the 'utf8-cmdline-prototype' branch: add a new function that retrieves
>>> target list from args, instead of converting the encoding, assumes that
>>> args have already been converted.
>>>
>>>
>>>
>>> Once again, why don't you use apr_app_initialize() that does all that
>>> and more, instead of perpetuating the same problem? As long as this is used
>>> on Windows from main() instead of wmain(), there's no way it will work
>>> consistently.
>>>
>>
>> First I wanna mention that this exact change is still needed even if we
>> decide to use apr app to host the binary. At this point there is no
>> difference between using apr app and svn's implementation of wmain and main
>> wrappers.
>>
>> Also, I'd like to argue about using apr app in general. The subversion's
>> command line wasn't using this library during the whole history (as far as
>> I know), so I don't think it's correct to start using it now. What would we
>> do if we need some functionality apr app doesn't implement? I mean that
>> would be a quite huge change...
>>
>> In my opinion, apr app is not something fully ready to use in subversion.
>> For example, it doesn't wrap main on unix platforms, while it still can be
>> executed with non-utf8-ized arguments. In wmain() apr_conv_ucs2_to_utf8()
>> just ignores the error, which is bad (on win32).
>>
>> And finally, the code we have right now is pretty good, and it seems
>> possible to support non-ASCII arguments (which I'm working on).
>>
>> I hope this makes some sense.
>>
>>
>>
>> OK, fine, but I still don't see where you're converting the argument list
>> directly from UTF-16-LE to UTF-8. svn_cmdline__win32_get_cstring_argv
>> doesn't do that, and it should.
>>
>> That's the main problem on Windows: that function is converting to the
>> local multibyte character set, then svn_client_args_to_target_array2
>> converts /that/ to UTF-8. That's fine-ish on Unix, where we don't convert
>> twice, but completely brain-dead on Windows. And I freely admit I had a lot
>> do do with that brain-deadedness.
>>
>> This is what should be happening, IMO:
>>
>>    1. deprecate svn_client_args_to_target_array2
>>    2. rename your svn_client_args_to_target_array_utf8
>>    svn_client_args_to_target_array3 and make it explicitly require that the
>>    args are already in UTF-8
>>    3. svn_cmdline__win32_get_cstring_argv must convert to UTF-8
>>    4. svn_cmdline__default_get_cstring_argv must *also* convert to UTF-8
>>
>>
>> You've done steps 1 and 2, 3 should be trivial and 4 will make the result
>> compatible with 3.
>>
>> -- Brane
>>
>
> IMHO, it's much better to follow my way instead, in which I'm adding utf8
> alternatives for those functions, without deprecating old ones, let's call
> them cstring.
>
>
>
> The old ones are wrong, period. Subversion's internal character set has
> been UTF-8 practically from day 1. svn_client_args_to_target_array2 is
> violating that convention. We have conversion functions.
>
>
>
Oh, exactly! You are right and this finally makes sense. I didn't initially
think about that. Let me quickly fix that...


>
> This is exactly what happened with svn_cmdline__get_cstring_argv(). I
> added a new function svn_cmdline__get_utf8_argv(), that returns a utf8
> string instead. Then we can switch the one used in each executable
> one-by-one.
>
>
> Those are private functions, you can do whatever you want with them, as
> long as you end up with only one version of each instead of, I dunno, five.
>
>
Those are also transitional functions needed to migrate all usages
smoothly. Probably it makes sense to drop the old ones then, but it would
not be a big deal for us in future...


>
> The same thing with svn_client_args_to_target_array_utf8, and also this
> difference in names is also helpful to differentiate between cstring and
> utf8 versions much more than version bump. Also the old one has not been
> deprecated, because someone may still want this function to convert
> encoding.
>
>
> Except for the encoding conversion functions, *none* of our API has utf8
> in the name, because that's understood and implied.
> svn_client_args_to_target_array2 is one of the few exceptions that converts
> between encoding without saying so explicitly in the name, and it's been a
> pain to deal with precisely because of that.
>
> Our API requires that strings are in UTF-8. The fewer exceptions we have
> to that, the better.
>
>
Now, I see.


>
> Let's not overcomplicate things if possible.
>
>
> Yes, exactly. Let's not maintain multiple incompatible versions of the
> same function. I completely agree.
>

 Yay!


>
> -- Brane
>


-- 
Timofei Zhakov

Reply via email to