On 26. 5. 25 21:25, Timofei Zhakov wrote:
On Mon, May 26, 2025 at 8:29 PM Branko Čibej <br...@apache.org> wrote:

    On 26. 5. 25 14:07, Timofei Zhakov wrote:
    On Tue, May 20, 2025 at 9:29 PM Branko Čibej <br...@apache.org>
    wrote:

        On 20. 5. 25 20:31, rin...@apache.org wrote:
        Author: rinrab
        Date: Tue May 20 18:31:35 2025
        New Revision: 1925722

        URL:http://svn.apache.org/viewvc?rev=1925722&view=rev  
<http://svn.apache.org/viewvc?rev=1925722&view=rev>
        Log:
        On the 'utf8-cmdline-prototype' branch: add a new function that 
retrieves
        target list from args, instead of converting the encoding, assumes that
        args have already been converted.


        Once again, why don't you use apr_app_initialize() that does
        all that and more, instead of perpetuating the same problem?
        As long as this is used on Windows from main() instead of
        wmain(), there's no way it will work consistently.


    First I wanna mention that this exact change is still needed even
    if we decide to use apr app to host the binary. At this point
    there is no difference between using apr app and svn's
    implementation of wmain and main wrappers.

    Also, I'd like to argue about using apr app in general. The
    subversion's command line wasn't using this library during the
    whole history (as far as I know), so I don't think it's correct
    to start using it now. What would we do if we need some
    functionality apr app doesn't implement? I mean that would be a
    quite huge change...

    In my opinion, apr app is not something fully ready to use in
    subversion. For example, it doesn't wrap main on unix platforms,
    while it still can be executed with non-utf8-ized arguments. In
    wmain() apr_conv_ucs2_to_utf8() just ignores the error, which is
    bad (on win32).

    And finally, the code we have right now is pretty good, and it
    seems possible to support non-ASCII arguments (which I'm working
    on).

    I hope this makes some sense.


    OK, fine, but I still don't see where you're converting the
    argument list directly from UTF-16-LE to UTF-8.
    svn_cmdline__win32_get_cstring_argv doesn't do that, and it should.

    That's the main problem on Windows: that function is converting to
    the local multibyte character set, then
    svn_client_args_to_target_array2 converts /that/ to UTF-8. That's
    fine-ish on Unix, where we don't convert twice, but completely
    brain-dead on Windows. And I freely admit I had a lot do do with
    that brain-deadedness.

    This is what should be happening, IMO:

     1. deprecate svn_client_args_to_target_array2
     2. rename your svn_client_args_to_target_array_utf8
        svn_client_args_to_target_array3 and make it explicitly
        require that the args are already in UTF-8
     3. svn_cmdline__win32_get_cstring_argv must convert to UTF-8
     4. svn_cmdline__default_get_cstring_argv must *also* convert to UTF-8


    You've done steps 1 and 2, 3 should be trivial and 4 will make the
    result compatible with 3.

    -- Brane


IMHO, it's much better to follow my way instead, in which I'm adding utf8 alternatives for those functions, without deprecating old ones, let's call them cstring.


The old ones are wrong, period. Subversion's internal character set has been UTF-8 practically from day 1. svn_client_args_to_target_array2 is violating that convention. We have conversion functions.



This is exactly what happened with svn_cmdline__get_cstring_argv(). I added a new function svn_cmdline__get_utf8_argv(), that returns a utf8 string instead. Then we can switch the one used in each executable one-by-one.

Those are private functions, you can do whatever you want with them, as long as you end up with only one version of each instead of, I dunno, five.


The same thing with svn_client_args_to_target_array_utf8, and also this difference in names is also helpful to differentiate between cstring and utf8 versions much more than version bump. Also the old one has not been deprecated, because someone may still want this function to convert encoding.

Except for the encoding conversion functions, /none/ of our API has utf8 in the name, because that's understood and implied. svn_client_args_to_target_array2 is one of the few exceptions that converts between encoding without saying so explicitly in the name, and it's been a pain to deal with precisely because of that.

Our API requires that strings are in UTF-8. The fewer exceptions we have to that, the better.


Let's not overcomplicate things if possible.

Yes, exactly. Let's not maintain multiple incompatible versions of the same function. I completely agree.

-- Brane

Reply via email to