David Hirschfield wrote:
I have a pair of programs which trade python data back and forth by
pickling up lists of objects on one side (using
pickle.HIGHEST_PROTOCOL), and sending that data over a TCP socket
connection to the receiver, who unpickles the data and uses it.
So far this has been working fine, but I now need a way of separating
multiple chunks of pickled binary data in the stream being sent back and
forth.
Questions:
Is it safe to do what I'm doing? I didn't think there was anything
fundamentally wrong with sending binary pickled data, especially in the
closed, safe environment these programs operate under...but maybe I'm
making a poor assumption?
If there's no chance of malevolent attackers modifying the data stream
then you can safely ignore the otherwise dire consequences of unpickling
arbitrary chunks of data.
I was going to separate the chunks of pickled data with some well-formed
string, but couldn't that string potentially randomly appear in the
pickled data? Do I just pick an extremely
unlikely-to-be-randomly-generated string as the separator? Is there some
string that will definitely NEVER show up in pickled binary data?
I presumed each chunk was of a know structure. Couldn't you just lead of
with a pickled integer saying how many chunks follow?
I thought about base64 encoding the data, and then decoding on the
opposite side (like what xmlrpclib does), but that turns out to be a
very expensive operation, which I want to avoid, speed is of the essence
in this situation.
Yes, base64 stuffs three bytes into four (six bits per byte) giving you
a 33% overhead. Having said that, pickle isn't all that efficient a
representation because it's designed to be portable. If you are using
machines of the same type there are almost certainly faster binary
encodings.
Is there a reliable way to determine the byte count of some pickled
binary data? Can I rely on len(<pickled data>) == bytes?
Yes, since pickle returns a string of bytes, not a Unicode object.
If bandwidth really is becoming a limitation you might want to consider
uses of the struct module to represent things more compactly (but this
may be too difficult if the objects being exchanged are at all complex).
regards
Steve