New submission from mbiggs <pythonb...@doubleplum.net>:

In the Unicode HOWTO: http://docs.python.org/3.3/howto/unicode.html

It says the following:


"UTF-8 has several convenient properties:
(...)
2. A Unicode string is turned into a sequence of bytes containing no embedded 
zero bytes. This avoids byte-ordering issues, and means UTF-8 strings can be 
processed by C functions such as strcpy() and sent through protocols that can’t 
handle zero bytes."

This is not right.  UTF-8 uses the zero byte to represent the Unicode codepoint 
U+0000 (the ASCII NULL character).  This is a valid character in UTF-8 and is 
handled just fine by python's UTF-8 string encoding/decoding.

----------
assignee: docs@python
components: Documentation
messages: 341363
nosy: docs@python, mbiggs
priority: normal
severity: normal
status: open
title: Unicode HOWTO incorrectly states that UTF-8 contains no zero bytes
versions: Python 2.7, Python 3.5, Python 3.6, Python 3.7, Python 3.8

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue36789>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to