On 2020/10/05 1:57, Daniel Shahaf wrote: > Yasuhito FUTATSUKI wrote on Sun, 04 Oct 2020 21:56 +0900:
>> On 2020/09/26 19:12, Daniel Shahaf wrote: >>> 1 % svn propset svn:ignore "予定表.txt" ./ >>> 2 property 'svn:ignore' set on '.' >>> 3 % svn propset foo:ignore "予定表.txt" ./ >>> 4 property 'foo:ignore' set on '.' >>> 5 % LC_ALL=ja_JP.eucjp svn pl -v >>> 6 Properties on '.': >>> 7 foo:ignore >>> 8 予定表.txt >>> 9 svn:ignore >>> 10 ͽɽ.txt >>> >>> 11 % LC_ALL=C svn pg --strict svn:ignore >>> 12 {U+4E88}{U+5B9A}{U+8868}.txt >>> >>> 13 % svn propset svn:ignore "{U+4E88}.txt" ./ >>> 14 property 'svn:ignore' set on '.' >>> 15 % sqlite3 .svn/wc.db .dump | me >>> 16 (svn:ignore 29 {U+4E88}{U+5B9A}{U+8868}.txt ) >>> 17 % svn pg --strict svn:ignore >>> 18 {U+4E88}{U+5B9A}{U+8868}.txt >>> . >>> So, I think there are a number of different issues/gotchas here: >>> >>> - It's not possible to get the raw value of an svn:* property in >>> a working copy if the value is not representable in the local encoding. >> >> I belive that if we want to get property values precisely, we should >> use xml output, although --no-newline is enough in most case except >> this case. > > Hmm, that's an interesting one. On the one hand, «propget --xml» > does resolve the ambiguity issue of the ad-hoc escaping; on the other > hand: > > - We shouldn't require CLI users to use an XML parser in order to > retrieve values of binary blobs. Then do we need a new output format for "strict" values? > - The XML document declares itself to be in UTF-8. Does that mean XML > parsers are allowed to treat the dumped property values as UTF-8 and, > for example, convert the byte sequence (that comprises the value) to > another byte sequence, that's equivalent when treated as UTF-8 but > not equivalent when treated as binary blobs? (For example, convert > the UTF-8 to composed or decomposed normal form.) At least we expect there is no conversion of byte sequence on parsing, if the value is considered to be safe by svn_xml_is_xml_safe(). If it is not so, I think outputs of --xml is broken. Moreover, as properties have no meta data about its contents, we can't determine a property is a text or not even if it contains only printable characters, like 'eicar.com'[1]. So it is not so curious even if we might use base64 encoding for all properties (but I don't think it is good idea). [1] https://svn.haxx.se/dev/archive-2016-03/0043.shtml (Yes, I was also trapped by it yesterday.) Cheers, -- Yasuhito FUTATSUKI <futat...@yf.bsclub.org>