Bruno Cauet <brunoca...@gmail.com> writes: > Unicode characters outside the ASCII range also get encoded when they > have no reason to, e.g. > >>> pathlib.PurePath("/home/싸이/").as_uri() > 'file:///home/%EC%8B%B8%EC%9D%B4'
Non-ASCII characters are not legal uri characters. Look at section 2.3 of "http://www.faqs.org/rfcs/rfc2396.html". You see there "unreserved = alphanum | mark" with with "alphanum" defined in section 1.6 as the ASCII letters and digits. See also section 2.1 ("URI and non-ASCII characters"). It tells that non-ASCII characters should be utf-8 encoded and then uri-escaped. Thus, the handling (by "urllib") of non-ASCII unicode characters seems to be correct. -- https://mail.python.org/mailman/listinfo/python-list