>>>>> On Wed, 22 Nov 2017, Michał Górny wrote: > Path and filename encoding > --------------------------
> The path fields in the Manifest file must consist of characters > corresponding to valid UTF-8 code points excluding the NULL character > (``U+0000``), the backwards slash (``\``) and characters classified > as whitespace in the current version of the Unicode standard > [#UNICODE]_. As I said before, all C0 and C1 control characters and DEL should be excluded as well, i.e. 0x00 to 0x1f, 0x7f, and 0x80 to 0x9f. Allowing such characters in what is basically a text file is only asking for trouble. > Any of the excluded characters that are present in path must be encoded > using one of the following escape sequences: > - characters in the ``U+0000`` to ``U+007F`` range can be encoded > as ``\xHH`` where ``HH`` specifies the zero-padded, hexadecimal > character code, > - characters in the ``U+0000`` to ``U+FFFF`` range can be encoded > as ``\uHHHH`` where ``HHHH`` specifies the zero-padded, hexadecimal > character code, > - characters in the UCS-4 range can be encoded as ``\UHHHHHHHH`` > where ``HHHHHHHH`` specifies the zero-padded, hexadecimal character > code. > It is invalid for backwards slash to be used in any other context, > and a backwards slash present in filename must be encoded. Backwards > slash used as path component separator should be replaced by forward > slash instead. This entire section about the escape mechanism should be clearly labelled as being purely optional, as it is not relevant for Gentoo (and would break backwards compatibility with existing package manager implementations). Maybe add a reference to GLEP 31 too? > The encoding can be used for other characters as well. In particular, > escaping control characters is recommended to ensure that the file > works correctly in text editors. See above, this should not be "recommended", but literal control chars should be strictly forbidden. Ulrich
pgpgwAnxngceA.pgp
Description: PGP signature