On 07/17/2010 06:21 PM, raj wrote: > Hi, > > I am using 64 bit Python on an x86_64 platform (Fedora 13). I have > some code that uses the python marshal module to serialize some > objects to files. However, in moving the code to python 3 I have come > across a situation where, if more than one object has been serialized > to a file, then while trying to de-serialize only the first object is > de-serialized. Trying to de-serialize the second object raises an > EOFError. De-serialization of multiple objects works fine in Python > 2.x. I tried going through the Python 3 documentation to see if > marshal functionality has been changed, but haven't found anything to > that effect. Does anyone else see this problem? Here is some > example code:
Interesting. I modified your script a bit: 0:pts/2:/tmp% cat marshtest.py from __future__ import print_function import marshal import sys if sys.version_info[0] == 3: bytehex = lambda i: '%02X ' % i else: bytehex = lambda c: '%02X ' % ord(c) numlines = 1 numwords = 25 stream = open('fails.mar','wb') marshal.dump(numlines, stream) marshal.dump(numwords, stream) stream.close() tmpstream = open('fails.mar', 'rb') for byte in tmpstream.read(): sys.stdout.write(bytehex(byte)) sys.stdout.write('\n') tmpstream.seek(0) print('pos:', tmpstream.tell()) value1 = marshal.load(tmpstream) print('val:', value1) print('pos:', tmpstream.tell()) value2 = marshal.load(tmpstream) print('val:', value2) print('pos:', tmpstream.tell()) print(value1 == numlines) print(value2 == numwords) 0:pts/2:/tmp% python2.6 marshtest.py 69 01 00 00 00 69 19 00 00 00 pos: 0 val: 1 pos: 5 val: 25 pos: 10 True True 0:pts/2:/tmp% python3.1 marshtest.py 69 01 00 00 00 69 19 00 00 00 pos: 0 val: 1 pos: 10 Traceback (most recent call last): File "marshtest.py", line 29, in <module> value2 = marshal.load(tmpstream) EOFError: EOF read where object expected 1:pts/2:/tmp% So, the contents of the file is identical, but Python 3 reads the whole file, Python 2 reads only the data it uses. This looks like a simple optimisation: read the whole file at once, instead of byte-by-byte, to improve performance when reading large objects. (such as Python modules...) The question is: was storing multiple objects in sequence an intended use of the marshal module? I doubt it. You can always wrap your data in tuples or use pickle. > > bash-4.1$ cat marshaltest.py > import marshal > > numlines = 1 > numwords = 25 > > stream = open('fails.mar','wb') > marshal.dump(numlines, stream) > marshal.dump(numwords, stream) > stream.close() > > tmpstream = open('fails.mar', 'rb') > value1 = marshal.load(tmpstream) > value2 = marshal.load(tmpstream) > > print(value1 == numlines) > print(value2 == numwords) > > > Here are the results of running this code > > bash-4.1$ python2.7 marshaltest.py > True > True > > bash-4.1$ python3.1 marshaltest.py > Traceback (most recent call last): > File "marshaltest.py", line 13, in <module> > value2 = marshal.load(tmpstream) > EOFError: EOF read where object expected > > Interestingly the file created by using Python 3.1 is readable by both > Python 2.7 as well as Python 2.6 and both objects are successfully > read. > > Cheers, > raj -- http://mail.python.org/mailman/listinfo/python-list