Yasuhito FUTATSUKI wrote on Fri, 25 Sep 2020 17:05 +0900: > On 2020/09/25 16:06, Daniel Shahaf wrote: > > futat...@apache.org wrote on Thu, 24 Sep 2020 17:06 -0000: > >> Author: futatuki > >> Date: Thu Sep 24 17:06:39 2020 > >> New Revision: 1881985 > >> > >> URL: http://svn.apache.org/viewvc?rev=1881985&view=rev > >> Log: > >> Follow up to r1880192: Fix an EOL issue in test on Windows. > >> > >> * subversion/tests/cmdline/merge-tests.py > >> (merge_deleted_folder_with_mergeinfo_2): Use os.linesep instead of '\n' > >> in expected values of svn:mergeinfo. > >> > >> Modified: > >> subversion/trunk/subversion/tests/cmdline/merge_tests.py > >> > >> Modified: subversion/trunk/subversion/tests/cmdline/merge_tests.py > >> URL: > >> http://svn.apache.org/viewvc/subversion/trunk/subversion/tests/cmdline/merge_tests.py?rev=1881985&r1=1881984&r2=1881985&view=diff > >> ============================================================================== > >> --- subversion/trunk/subversion/tests/cmdline/merge_tests.py (original) > >> +++ subversion/trunk/subversion/tests/cmdline/merge_tests.py Thu Sep 24 > >> 17:06:39 2020 > >> @@ -18639,11 +18639,19 @@ def merge_deleted_folder_with_mergeinfo_ > >> ) > >> > >> # verify that mergeinfo is set/changed on A/D, A/D/G, A/D/G2. > >> + > >> + # NOTE: When writing out multi-line prop values in svn:* props, the > >> + # client converts to local encoding and local eol style. > >> + # Therefore, the expected output must contain the right kind of eoln > >> + # strings. That's why we use os.linesep in the tests below, not just > >> + # plain '\n'. The _last_ \n is also from the client, but it's not > >> + # part of the prop value and it doesn't get converted in the pipe. > > Before answer the questions, the comment above was brougt from > prop_value_conversion() in prop_tests.py. >
*nod* > > I'm confused. > > > > First, if the client applies any transformation at all to property > > values, that'd be a bug. Property values are opaque octet sequences, > > just like file contents, so they must be emitted verbatim. It so > > happens that values of svn:* properties are UTF-8 with LF line endings, > > so that's what Python should see, regardless of the local encoding and > > EOL style. > > I judged that EOL conversion of property values on 'svn pg' isn't bug > because this comment is found in test for it. If you mean prop_value_conversion(), that test isn't about EOL style and encodings at all. It's about normalization of the values of some svn:* properties, such as this: . % svn ps -q svn:executable yes iota % svn pg svn:executable iota * % The test for binary property values is prop_tests.py:binary_props(). It doesn't cover newlines. > If it is a bug, prop_tests.py also does not test correctly. I've looked further and I think it's working as designed, but the design is rather unintuitive. In the data model, property values are binary data. That's why, in the API, property hashes' value type is svn_string_t. However, the cmdline client doesn't print all properties as binary data; it prints svn:* properties as text, transcoded and EOL-converted, even as it prints other properties in binary: . https://svn.apache.org/viewvc/subversion/tags/1.0.0/subversion/clients/cmdline/props.c?view=markup#l50 The code still exists in trunk@HEAD, in propget-cmd.c. These semantics mean prop_tests.py and merge_tests.py are both correct to expect os.linesep in the output. However, I find these semantics rather unintuitive. Imagine someone using a foo:ignore property as staging for svn:ignore: 1 % svn propset svn:ignore "予定表.txt" ./ 2 property 'svn:ignore' set on '.' 3 % svn propset foo:ignore "予定表.txt" ./ 4 property 'foo:ignore' set on '.' 5 % LC_ALL=ja_JP.eucjp svn pl -v 6 Properties on '.': 7 foo:ignore 8 予定表.txt 9 svn:ignore 10 ͽɽ.txt . The same sequence of bytes is emitted differently depending on what property it belongs to — once in binary (foo:ignore), once in the local encoding (svn:ignore), within the same list (!). (My terminal was in UTF-8 mode the whole time.) 11 % LC_ALL=C svn pg --strict svn:ignore 12 {U+4E88}{U+5B9A}{U+8868}.txt . There is no way in the cmdline client to get the value of an svn:* property that's not representable in the local encoding; in part because… 13 % svn propset svn:ignore "{U+4E88}.txt" ./ 14 property 'svn:ignore' set on '.' 15 % sqlite3 .svn/wc.db .dump | me 16 (svn:ignore 29 {U+4E88}{U+5B9A}{U+8868}.txt ) 17 % svn pg --strict svn:ignore 18 {U+4E88}{U+5B9A}{U+8868}.txt . …the cmdline client's output is not unambiguously parseable. (These commands were run in a UTF-8 locale.) [On line 15 I grepped for the property skel (line 16)'s hexdump representation — that's how sqlite3 dumps render BLOB columns — and manually extracted and converted it. E.g., to find "4E88", I'd grep for its ASCII hex representation, "34453838".] On the other hand, for «propset» there is a --encoding=UTF-8 flag that allows property values not representable in the current encoding to be set. (The value still undergoes EOL conversion. I haven't checked whether it undergoes composition or decomposition.) However, one still has to know that «svn propset foo:ignore -F» takes binary input while «svn propset svn:ignore -F» takes input in the local encoding. So, I think there are a number of different issues/gotchas here: - prop_tests.py:binary_props() should cover property values that contain bytes values such as 0x0A, 0x0D, and 0x80-0x8F (newlines and non-ASCII). - Getting the value of an svn:* property is different to getting the value of any other property, in both «propget» and «proplist». - It's not possible to get the raw value of an svn:* property in a working copy if the value is not representable in the local encoding. - The {U+hhhh} fallback encoding is ambiguous. (And in any case, I guess scripts are going to have to roll their own parsing if they want to undo this escaping?) - Setting the value of an svn:* property is different to setting the value of any other property, and there will not necessarily be an error message. (For instance, imagine extracting an svn:ignore value from a dumpfile into a temporary file and then using «svn propset -F» on the temporary file, while the local encoding is ISO-8859-1. That will silently succeed.) - The remark in prop_value_conversion() about a 'last \n' has bitrotted. [It should have been removed in r845498 (= r5424) — which, coincidentally, is also the revision in which binary_props() was added.] > > For the same reason, I'd have expected the expected output to be given > > as bytes objects rather than as str objects. > > Those expected values are converted to bytes, just below of the diff > lines. Thanks, I missed that. Yasuhito FUTATSUKI wrote on Fri, 25 Sep 2020 19:09 +0900: > On 2020/09/25 17:05, Yasuhito FUTATSUKI wrote: > > On 2020/09/25 16:06, Daniel Shahaf wrote: > > >> First, if the client applies any transformation at all to property > >> values, that'd be a bug. Property values are opaque octet sequences, > >> just like file contents, so they must be emitted verbatim. It so > >> happens that values of svn:* properties are UTF-8 with LF line endings, > >> so that's what Python should see, regardless of the local encoding and > >> EOL style. > > > > I judged that EOL conversion of property values on 'svn pg' isn't bug > > because this comment is found in test for it. If it is a bug, > > prop_tests.py also does not test correctly. > > This behavior was introduced in r1003238, as a "fix" of the issue #3721. Good find — but if that's the case, why was the comment describing this behaviour was added to the test much earlier, in r845369 (= r5295)? Cheers, Daniel