[issue36789] Unicode HOWTO incorrectly states that UTF-8 contains no zero bytes
New submission from mbiggs : In the Unicode HOWTO: http://docs.python.org/3.3/howto/unicode.html It says the following: "UTF-8 has several convenient properties: (...) 2. A Unicode string is turned into a sequence of bytes containing no embedded zero bytes. This avoids byte-ordering issues, and means UTF-8 strings can be processed by C functions such as strcpy() and sent through protocols that can’t handle zero bytes." This is not right. UTF-8 uses the zero byte to represent the Unicode codepoint U+ (the ASCII NULL character). This is a valid character in UTF-8 and is handled just fine by python's UTF-8 string encoding/decoding. -- assignee: docs@python components: Documentation messages: 341363 nosy: docs@python, mbiggs priority: normal severity: normal status: open title: Unicode HOWTO incorrectly states that UTF-8 contains no zero bytes versions: Python 2.7, Python 3.5, Python 3.6, Python 3.7, Python 3.8 ___ Python tracker <https://bugs.python.org/issue36789> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36789] Unicode HOWTO incorrectly states that UTF-8 contains no zero bytes
mbiggs added the comment: So a correct statement would be "A UTF-8 string is turned into a sequence of bytes that contains embedded zero bytes only where they represent the NULL character (U+)." I think it's important to correct this because the part about processing UTF-8 with C functions like strcpy(), was wrong and could cause bugs. -- ___ Python tracker <https://bugs.python.org/issue36789> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36789] Unicode HOWTO incorrectly states that UTF-8 contains no zero bytes
Change by mbiggs : -- pull_requests: +13102 ___ Python tracker <https://bugs.python.org/issue36789> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36789] Unicode HOWTO incorrectly states that UTF-8 contains no zero bytes
mbiggs added the comment: Ah sent a pull request but didn't realize that redshiftzero already had. Their PR looks good to me. Thanks for fixing this! -- ___ Python tracker <https://bugs.python.org/issue36789> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com