On 12/20/13 8:06 PM, Mark Lawrence wrote:
Quoting from http://docs.python.org/3/library/functions.html#bytearray

"The bytearray type is a mutable sequence of integers in the range 0 <=
x < 256."

Quoting from http://docs.python.org/3/library/stdtypes.html#bytes-methods

"Whenever a bytes or bytearray method needs to interpret the bytes as
characters (e.g. the is...() methods, split(), strip()), the ASCII
character set is assumed (text strings use Unicode semantics).

Note - Using these ASCII based methods to manipulate binary data that is
not stored in an ASCII based format may lead to data corruption.

The search operations (in, count(), find(), index(), rfind() and
rindex()) all accept both integers in the range 0 to 255 (inclusive) as
well as bytes and byte array sequences.

Changed in version 3.3: All of the search methods also accept an integer
in the range 0 to 255 (inclusive) as their first argument."

I don't understand why the docs talk about "a mutable sequence of
integers" but then discuss "needs to interpret the bytes as characters".

The split and strip methods work with whitespace when given no arguments. Bytes aren't whitespace. Characters can be, so the bytes need to be interpreted as characters. Likewise, the is* methods (isalnum, isalpha, isdigit, islower, isspace, istitle, isupper) all require characters, so the bytes must be interpreted.

  Further I don't understand why the changes done in 3.3 referred to
above haven't also been applied to (say) the split method.  If I can
call find to look for a zero, why can't I split on it?


I don't know the reason, but I would guess either no one considered it, or it was deemed unlikely to be useful.

If you have a zero, you can split on it with: bytestring.split(bytes([0])), but that doesn't explain why find can take a simple zero, and split has to take a bytestring with a zero in it.

--
Ned Batchelder, http://nedbatchelder.com

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to