On 26. 5. 25 21:25, Timofei Zhakov wrote:
On Mon, May 26, 2025 at 8:29 PM Branko Čibej <br...@apache.org> wrote:
On 26. 5. 25 14:07, Timofei Zhakov wrote:
On Tue, May 20, 2025 at 9:29 PM Branko Čibej <br...@apache.org>
wrote:
On 20. 5. 25 20:31, rin...@apache.org wrote:
Author: rinrab
Date: Tue May 20 18:31:35 2025
New Revision: 1925722
URL:http://svn.apache.org/viewvc?rev=1925722&view=rev
<http://svn.apache.org/viewvc?rev=1925722&view=rev>
Log:
On the 'utf8-cmdline-prototype' branch: add a new function that
retrieves
target list from args, instead of converting the encoding, assumes that
args have already been converted.
Once again, why don't you use apr_app_initialize() that does
all that and more, instead of perpetuating the same problem?
As long as this is used on Windows from main() instead of
wmain(), there's no way it will work consistently.
First I wanna mention that this exact change is still needed even
if we decide to use apr app to host the binary. At this point
there is no difference between using apr app and svn's
implementation of wmain and main wrappers.
Also, I'd like to argue about using apr app in general. The
subversion's command line wasn't using this library during the
whole history (as far as I know), so I don't think it's correct
to start using it now. What would we do if we need some
functionality apr app doesn't implement? I mean that would be a
quite huge change...
In my opinion, apr app is not something fully ready to use in
subversion. For example, it doesn't wrap main on unix platforms,
while it still can be executed with non-utf8-ized arguments. In
wmain() apr_conv_ucs2_to_utf8() just ignores the error, which is
bad (on win32).
And finally, the code we have right now is pretty good, and it
seems possible to support non-ASCII arguments (which I'm working
on).
I hope this makes some sense.
OK, fine, but I still don't see where you're converting the
argument list directly from UTF-16-LE to UTF-8.
svn_cmdline__win32_get_cstring_argv doesn't do that, and it should.
That's the main problem on Windows: that function is converting to
the local multibyte character set, then
svn_client_args_to_target_array2 converts /that/ to UTF-8. That's
fine-ish on Unix, where we don't convert twice, but completely
brain-dead on Windows. And I freely admit I had a lot do do with
that brain-deadedness.
This is what should be happening, IMO:
1. deprecate svn_client_args_to_target_array2
2. rename your svn_client_args_to_target_array_utf8
svn_client_args_to_target_array3 and make it explicitly
require that the args are already in UTF-8
3. svn_cmdline__win32_get_cstring_argv must convert to UTF-8
4. svn_cmdline__default_get_cstring_argv must *also* convert to UTF-8
You've done steps 1 and 2, 3 should be trivial and 4 will make the
result compatible with 3.
-- Brane
IMHO, it's much better to follow my way instead, in which I'm adding
utf8 alternatives for those functions, without deprecating old ones,
let's call them cstring.
The old ones are wrong, period. Subversion's internal character set has
been UTF-8 practically from day 1. svn_client_args_to_target_array2 is
violating that convention. We have conversion functions.
This is exactly what happened with svn_cmdline__get_cstring_argv(). I
added a new function svn_cmdline__get_utf8_argv(), that returns a utf8
string instead. Then we can switch the one used in each executable
one-by-one.
Those are private functions, you can do whatever you want with them, as
long as you end up with only one version of each instead of, I dunno, five.
The same thing with svn_client_args_to_target_array_utf8, and also
this difference in names is also helpful to differentiate between
cstring and utf8 versions much more than version bump. Also the old
one has not been deprecated, because someone may still want this
function to convert encoding.
Except for the encoding conversion functions, /none/ of our API has utf8
in the name, because that's understood and implied.
svn_client_args_to_target_array2 is one of the few exceptions that
converts between encoding without saying so explicitly in the name, and
it's been a pain to deal with precisely because of that.
Our API requires that strings are in UTF-8. The fewer exceptions we have
to that, the better.
Let's not overcomplicate things if possible.
Yes, exactly. Let's not maintain multiple incompatible versions of the
same function. I completely agree.
-- Brane