merge_tests.py

Daniel Shahaf Sat, 26 Sep 2020 03:13:43 -0700

Yasuhito FUTATSUKI wrote on Fri, 25 Sep 2020 17:05 +0900:
> On 2020/09/25 16:06, Daniel Shahaf wrote:
> > [email protected] wrote on Thu, 24 Sep 2020 17:06 -0000:  
> >> Author: futatuki
> >> Date: Thu Sep 24 17:06:39 2020
> >> New Revision: 1881985
> >>
> >> URL: http://svn.apache.org/viewvc?rev=1881985&view=rev
> >> Log:
> >> Follow up to r1880192: Fix an EOL issue in test on Windows.
> >>
> >> * subversion/tests/cmdline/merge-tests.py
> >>   (merge_deleted_folder_with_mergeinfo_2): Use os.linesep instead of '\n'
> >>   in expected values of svn:mergeinfo.
> >>
> >> Modified:
> >>     subversion/trunk/subversion/tests/cmdline/merge_tests.py
> >>
> >> Modified: subversion/trunk/subversion/tests/cmdline/merge_tests.py
> >> URL: 
> >> http://svn.apache.org/viewvc/subversion/trunk/subversion/tests/cmdline/merge_tests.py?rev=1881985&r1=1881984&r2=1881985&view=diff
> >> ==============================================================================
> >> --- subversion/trunk/subversion/tests/cmdline/merge_tests.py (original)
> >> +++ subversion/trunk/subversion/tests/cmdline/merge_tests.py Thu Sep 24 
> >> 17:06:39 2020
> >> @@ -18639,11 +18639,19 @@ def merge_deleted_folder_with_mergeinfo_
> >>                                         )
> >>  
> >>    # verify that mergeinfo is set/changed on A/D, A/D/G, A/D/G2.
> >> +
> >> +  # NOTE: When writing out multi-line prop values in svn:* props, the
> >> +  # client converts to local encoding and local eol style.
> >> +  # Therefore, the expected output must contain the right kind of eoln
> >> +  # strings. That's why we use os.linesep in the tests below, not just
> >> +  # plain '\n'. The _last_ \n is also from the client, but it's not
> >> +  # part of the prop value and it doesn't get converted in the pipe.  
> 
> Before answer the questions, the comment above was brougt from 
> prop_value_conversion() in prop_tests.py.
>


*nod*

> > I'm confused.
> > 
> > First, if the client applies any transformation at all to property
> > values, that'd be a bug.  Property values are opaque octet sequences,
> > just like file contents, so they must be emitted verbatim.  It so
> > happens that values of svn:* properties are UTF-8 with LF line endings,
> > so that's what Python should see, regardless of the local encoding and
> > EOL style.  
> 
> I judged that EOL conversion of property values on 'svn pg' isn't bug
> because this comment is found in test for it.

If you mean prop_value_conversion(), that test isn't about EOL style
and encodings at all.  It's about normalization of the values of some
svn:* properties, such as this:
.
    % svn ps -q svn:executable yes iota 
    % svn pg svn:executable iota 
    *
    % 

The test for binary property values is prop_tests.py:binary_props().  It
doesn't cover newlines.

> If it is a bug, prop_tests.py also does not test correctly.

I've looked further and I think it's working as designed, but the
design is rather unintuitive.

In the data model, property values are binary data.  That's why, in
the API, property hashes' value type is svn_string_t.  However, the
cmdline client doesn't print all properties as binary data; it prints
svn:* properties as text, transcoded and EOL-converted, even as it
prints other properties in binary:
.
    
https://svn.apache.org/viewvc/subversion/tags/1.0.0/subversion/clients/cmdline/props.c?view=markup#l50

The code still exists in trunk@HEAD, in propget-cmd.c.

These semantics mean prop_tests.py and merge_tests.py are both correct
to expect os.linesep in the output.

However, I find these semantics rather unintuitive.  Imagine someone
using a foo:ignore property as staging for svn:ignore:

     1  % svn propset svn:ignore "予定表.txt" ./ 
     2  property 'svn:ignore' set on '.'
     3  % svn propset foo:ignore "予定表.txt" ./ 
     4  property 'foo:ignore' set on '.'
     5  % LC_ALL=ja_JP.eucjp svn pl -v
     6  Properties on '.':
     7    foo:ignore
     8      予定表.txt
     9    svn:ignore
    10      ͽɽ.txt
.
The same sequence of bytes is emitted differently depending on what
property it belongs to — once in binary (foo:ignore), once in the local
encoding (svn:ignore), within the same list (!).  (My terminal was in
UTF-8 mode the whole time.)

    11  % LC_ALL=C svn pg --strict svn:ignore
    12  {U+4E88}{U+5B9A}{U+8868}.txt
.
There is no way in the cmdline client to get the value of an svn:*
property that's not representable in the local encoding; in part
because…

    13  % svn propset svn:ignore "{U+4E88}.txt" ./ 
    14  property 'svn:ignore' set on '.'
    15  % sqlite3 .svn/wc.db .dump | me
    16  (svn:ignore 29 {U+4E88}{U+5B9A}{U+8868}.txt )
    17  % svn pg --strict svn:ignore 
    18  {U+4E88}{U+5B9A}{U+8868}.txt
.
…the cmdline client's output is not unambiguously parseable.  (These
commands were run in a UTF-8 locale.)

[On line 15 I grepped for the property skel (line 16)'s hexdump
representation — that's how sqlite3 dumps render BLOB columns — and
manually extracted and converted it.  E.g., to find "4E88", I'd grep
for its ASCII hex representation, "34453838".]

On the other hand, for «propset» there is a --encoding=UTF-8 flag that
allows property values not representable in the current encoding to be
set.  (The value still undergoes EOL conversion.  I haven't checked
whether it undergoes composition or decomposition.)  However, one still
has to know that «svn propset foo:ignore -F» takes binary input while
«svn propset svn:ignore -F» takes input in the local encoding.

So, I think there are a number of different issues/gotchas here:

- prop_tests.py:binary_props() should cover property values that contain
  bytes values such as 0x0A, 0x0D, and 0x80-0x8F (newlines and non-ASCII).

- Getting the value of an svn:* property is different to getting the
  value of any other property, in both «propget» and «proplist».

- It's not possible to get the raw value of an svn:* property in
  a working copy if the value is not representable in the local encoding.

- The {U+hhhh} fallback encoding is ambiguous.  (And in any case,
  I guess scripts are going to have to roll their own parsing if they
  want to undo this escaping?)

- Setting the value of an svn:* property is different to setting the
  value of any other property, and there will not necessarily be an
  error message.  (For instance, imagine extracting an svn:ignore value
  from a dumpfile into a temporary file and then using «svn propset -F»
  on the temporary file, while the local encoding is ISO-8859-1.  That
  will silently succeed.)

- The remark in prop_value_conversion() about a 'last \n' has bitrotted.
  [It should have been removed in r845498 (= r5424) — which,
  coincidentally, is also the revision in which binary_props() was
  added.]

> > For the same reason, I'd have expected the expected output to be given
> > as bytes objects rather than as str objects.  
> 
> Those expected values are converted to bytes, just below of the diff
> lines.

Thanks, I missed that.

Yasuhito FUTATSUKI wrote on Fri, 25 Sep 2020 19:09 +0900:
> On 2020/09/25 17:05, Yasuhito FUTATSUKI wrote:
> > On 2020/09/25 16:06, Daniel Shahaf wrote:  
> 
> >> First, if the client applies any transformation at all to property
> >> values, that'd be a bug.  Property values are opaque octet sequences,
> >> just like file contents, so they must be emitted verbatim.  It so
> >> happens that values of svn:* properties are UTF-8 with LF line endings,
> >> so that's what Python should see, regardless of the local encoding and
> >> EOL style.  
> > 
> > I judged that EOL conversion of property values on 'svn pg' isn't bug
> > because this comment is found in test for it. If it is a bug,
> > prop_tests.py also does not test correctly.  
> 
> This behavior was introduced in r1003238, as a "fix" of the issue #3721.

Good find — but if that's the case, why was the comment describing this
behaviour was added to the test much earlier, in r845369 (= r5295)?

Cheers,

Daniel

Re: svn commit: r1881985 - /subversion/trunk/subversion/tests/cmdline/merge_tests.py

Reply via email to