On 11/27/2011 9:33 AM, Irmen de Jong wrote:
Hi,
A bytearray is pickled (using max protocol) as follows:

pickletools.dis(pickle.dumps(bytearray([255]*10),2))
     0: \x80 PROTO      2
     2: c    GLOBAL     '__builtin__ bytearray'
    25: q    BINPUT     0
    27: X    BINUNICODE u'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
    52: q    BINPUT     1
    54: U    SHORT_BINSTRING 'latin-1'
    63: q    BINPUT     2
    65: \x86 TUPLE2
    66: q    BINPUT     3
    68: R    REDUCE
    69: q    BINPUT     4
    71: .    STOP

bytearray("\xff"*10).__reduce__()
(<type 'bytearray'>, (u'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff', 'latin-1'), 
None)


Is there a particular reason it is encoded so inefficiently? Most notably, the 
actual
*bytes* in the bytearray are represented by an UTF-8 string. This needs to be
transformed into a unicode string and then encoded back into bytes, when 
unpickled. The
thing being a bytearray, I would expect it to be pickled as such: a sequence of 
bytes.
And then possibly converted back to bytearray using the constructor that takes 
the bytes
directly (BINSTRING/BINBYTES pickle opcodes).

The above occurs both on Python 2.x and 3.x.

Any ideas? Candidate for a patch?
Possibly. The two developers listed as particularly interested in pickle 
are 'alexandre.vassalotti,pitrou' (antoine), so if you do open a tracker 
issue, add them as nosy.
Take a look at http://www.python.org/dev/peps/pep-3154/
by Antoine Pitrou or forwary your message to him.

--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to