John Machin <[email protected]> added the comment:
@ezio.melotti: Your second sentence is true, but it is not the whole truth.
Bytes in the range C0-FF (whose high bit *is* set) ALSO shouldn't be considered
part of the sequence because they (like 00-7F) are invalid as continuation
bytes; they are either starter bytes (C2-F4) or invalid for any purpose (C0-C2
and F5-FF). Further, some bytes in the range 80-BF are NOT always valid as the
first continuation byte, it depends on what starter byte they follow.
The simple way of summarising the above is to say that a byte that is not a
valid continuation byte in the current state ("failing byte") is not a part of
the current (now known to be invalid) sequence, and the decoder must try again
("resync") with the failing byte.
Do you agree with my example 3?
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue8271>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com