On Wed, 20 May 2026, 14:40 Daniel Sahlberg, <[email protected]>
wrote:

> Den ons 20 maj 2026 kl 10:55 skrev <[email protected]>:
> >
> > Author: rinrab
> > Date: Wed May 20 08:55:33 2026
> > New Revision: 1934426
> >
> > Log:
> > Use UTF-8 alignement for the 'author' column in the 'svn blame' command.
> >
> > * subversion/svn/blame-cmd.c
> >   (#include): Add svn_utf_private.h.
> >   (print_line_info): Call svn_utf__cstring_utf8_align_right() to
> >    prepare author.
> >
> > Modified:
> >    subversion/trunk/subversion/svn/blame-cmd.c
> >
> > Modified: subversion/trunk/subversion/svn/blame-cmd.c
> >
> ==============================================================================
> > --- subversion/trunk/subversion/svn/blame-cmd.c Wed May 20 08:30:24
> 2026        (r1934425)
> > +++ subversion/trunk/subversion/svn/blame-cmd.c Wed May 20 08:55:33
> 2026        (r1934426)
> > @@ -24,6 +24,7 @@
> >
> >  /*** Includes. ***/
> >
> > +#include "private/svn_utf_private.h"
> >  #include "svn_client.h"
> >  #include "svn_error.h"
> >  #include "svn_dirent_uri.h"
> > @@ -150,8 +151,9 @@ print_line_info(svn_stream_t *out,
> >            time_stdout = "                                           -";
> >          }
> >
> > -      SVN_ERR(svn_stream_printf(out, pool, "%s %10s %s ", rev_str,
> > -                                author ? author : "         -",
> > +      SVN_ERR(svn_stream_printf(out, pool, "%s %s %s ", rev_str,
> > +                                svn_utf__cstring_utf8_align_right(
> > +                                    author ? author : "-", 10, pool),
> >                                  time_stdout));
>
> After this change the output of svn blame is different from before if
> there is a very long author name.
>
> I have tested with svn compiled about a month ago (the version in
> $PATH) and from a brand new (in ./subversion/svn). I have prepared a
> repo with a file where all lines are authored by "dsg" and the
> remaining by "averylongauthor" (15 characters, ASCII).
>
> This is my commit #2 by the long author:
> [[[
> dsg@devi-25-01:~/svn_trunk3$ ./subversion/svn/svn proplist -v
> --revprop -r2 ../wc/foo
> Unversioned properties on revision 2:
>   svn:author
>     averylongauthor
>   svn:date
>     2026-05-20T11:52:35.534418Z
>   svn:log
>     Modify line 4
> ]]]
>
> Blame before the change above:
> [[[
> dsg@devi-25-01:~/svn_trunk3$ svn blame ../wc/foo
>      1        dsg 1
>      1        dsg 2
>      1        dsg 3
>      2 averylonga Line 4
>      1        dsg 5
>      1        dsg 6
>      1        dsg 7
>      1        dsg 8
>      1        dsg 9
> ]]]
> Author names are right adjusted but when overflowing, the first 10
> characters are displayed.
>
> Blame after the change above:
> [[[
> dsg@devi-25-01:~/svn_trunk3$ ./subversion/svn/svn blame ../wc/foo
>      1        dsg 1
>      1        dsg 2
>      1        dsg 3
>      2 longauthor Line 4
>      1        dsg 5
>      1        dsg 6
>      1        dsg 7
>      1        dsg 8
>      1        dsg 9
> ]]]
> Author names are right adjusted but when overflowing, the last 10
> characters are displayed.
>
> (I'm aware there are more instances of svn_stream_printf and I haven't
> analysed exactly which one is involved here).
>
> I think we need to keep the precision in the formatting string and use
> the _align_left version.
>
> Kind regards,
> Daniel
>


Agreed, this is a very breaking/broken change. Changes that affect program
output need to be discussed on list and tested. This comment caught my
attention:


+ * Please note, there might be a little artifact when there is a wider
+ * character, then the string won't be perfectly aligned.


If true, it implies that svn_utf8_width() or whatever the function is
called isn't returning correct results.

I can't find the discussion about this now but I'd just note that
calculating the width of a Unicode string by only looking at individual
code points is not correct. Therefore, pruning away individual code points
without context in order to get a shorter string is not correct, either.
Some Unicode glyphs can use up to 5 code points.

-- Brane

Whoever sold us Unicode as a fixed-width encoding was running a pyramid
scheme. 😏

>

Reply via email to